[StarCluster] Eqw errors in SGE with default starcluster configuration

Josh Moore jlmo at cs.cornell.edu
Tue Feb 7 02:30:15 EST 2012


I tried submitting a bunch of jobs using qsub with a script that works fine
on another (non-Amazon) cluster's configuration of SGE. But on a cluster
configured with StarCluster, only the first 8 (on a cluster of c1.xlarge
nodes, so 8 cores each) enter the queue without error (all of those are
immediately executed on the master node). Even if I delete one of the jobs
on the master node, another one never takes its place. I have a cluster of
8 c1.xlarge nodes. Here is the output of qconf -ssconf:

algorithm                         default
schedule_interval                 0:0:15
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              np_load_avg=0.50
load_adjustment_decay_time        0:7:30
load_formula                      np_load_avg
schedd_job_info                   false
flush_submit_sec                  0
flush_finish_sec                  0
params                            none
reprioritize_interval             0:0:0
halftime                          168
usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor               5.000000
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         0
weight_tickets_share              0
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets               TRUE
max_pending_tasks_per_job         50
halflife_decay_list               none
policy_hierarchy                  OFS
weight_ticket                     0.010000
weight_waiting_time               0.000000
weight_deadline                   3600000.000000
weight_urgency                    0.100000
weight_priority                   1.000000
max_reservation                   0
default_duration                  INFINITY

I can't figure out how to change schedd_job_info to true to find out more
about the error message...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20120207/005f0b49/attachment.htm


More information about the StarCluster mailing list