[StarCluster] jobs on slave nodes disappear
Justin Riley
jtriley at MIT.EDU
Sat Dec 31 15:03:22 EST 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Liang,
Is this happening consistently even after restarting the cluster using
"starcluster restart mycluster"? Also, is there anything in your
job(s) error logs? Given the output you provided these would most
likely be located in the directory you submitted the job from and
should be named something like "single.sh.e23".
~Justin
On 12/30/2011 08:58 PM, liang cheng wrote:
> Greetings !
>
> I created a star cluster on EC2 and use qsub to submit jobs. It
> used to work well. From this afternoon, after I requested for
> additional EC2 instance from Amazon, the issue comes out.
>
> Only the jobs submitted to the master node are executed. Other
> jobs disappeared just in no time. Some diagonosis is as below. Any
> helps are appreciated !
>
> Happy New Year !
>
>
> root at master:/# qacct -j 23
> ==============================================================
> qname all.q hostname node006 group root
> owner root project NONE department defaultdepartment
> jobname single.sh out 3 jobnumber 23 taskid
> undefined account sge priority 0 qsub_time Sat Dec 31
> 01:38:32 2011 start_time Sat Dec 31 01:38:39 2011 end_time
> Sat Dec 31 01:38:39 2011 granted_pe NONE slots 1
> failed 0 exit_status 0 ru_wallclock 0 ru_utime 0.010
> ru_stime 0.010 ru_maxrss 2276 ru_ixrss 0
> ru_ismrss 0 ru_idrss 0 ru_isrss 0 ru_minflt 2648
> ru_majflt 0 ru_nswap 0 ru_inblock 0 ru_oublock 272
> ru_msgsnd 0 ru_msgrcv 0 ru_nsignals 0 ru_nvcsw 12
> ru_nivcsw 3 cpu 0.020 mem 0.000 io
> 0.000 iow 0.000 maxvmem 0.000 arid undefined
>
> =========================
>
> Thanks, -Liang
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk7/aooACgkQ4llAkMfDcrmFegCfULuLAaDIrEvDi1257HZR3ico
B5wAn2rGWD5D9c4rETIq07d6jKq/jrCs
=pb1b
-----END PGP SIGNATURE-----
More information about the StarCluster
mailing list