[StarCluster] jobs on slave nodes disappear

liang cheng liang.cheng at gmail.com
Fri Dec 30 20:58:20 EST 2011


Greetings !

I created  a star cluster on EC2 and use qsub to submit jobs. It used to
work well. From this afternoon, after I requested for additional EC2
instance from Amazon, the issue comes out.

Only the jobs submitted to the master node are executed. Other jobs
disappeared just in no time.  Some diagonosis is as below. Any helps are
appreciated !

Happy New Year !


root at master:/# qacct -j 23
==============================================================
qname        all.q
hostname     node006
group        root
owner        root
project      NONE
department   defaultdepartment
jobname      single.sh out 3
jobnumber    23
taskid       undefined
account      sge
priority     0
qsub_time    Sat Dec 31 01:38:32 2011
start_time   Sat Dec 31 01:38:39 2011
end_time     Sat Dec 31 01:38:39 2011
granted_pe   NONE
slots        1
failed       0
exit_status  0
ru_wallclock 0
ru_utime     0.010
ru_stime     0.010
ru_maxrss    2276
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    2648
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   272
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     12
ru_nivcsw    3
cpu          0.020
mem          0.000
io           0.000
iow          0.000
maxvmem      0.000
arid         undefined

=========================

Thanks,
-Liang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20111230/386e3daa/attachment.htm


More information about the StarCluster mailing list