<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">With your jobs get stuck in transfer
state and the primary difference is the size of the files being
processed, my initial guess would be some bottleneck relating to
all the I/O of the large files. Likely way too much for NFS to
handle efficiently so you should approach things in a different
way. If you run a <a
href="http://linux.about.com/od/commands/l/blcmdl1_top.htm">top</a>
or similar command on your master and worker nodes to see what
processes are being run, and what's using the largest share of the
resources during the course of submitting jobs you might be able
to easily pinpoint the source of the bottleneck.<br>
<br>
Good Luck,<br>
-Jennifer<br>
<br>
On 2/24/15 3:08 PM, Ying Sonia Ting wrote:<br>
</div>
<blockquote
cite="mid:CAHAMJ6td55X71X4sOitDAHhDWOdYdtbYM6tONQD4_dS_JyfhDQ@mail.gmail.com"
type="cite">
<div dir="ltr">Hi all,
<div><br>
</div>
<div>This might be more of a SGE issue than Starcluster issue
but I'd really appreciate any comments. <br>
<br>
</div>
<div>I have a bunch of jobs running on AWS spot instances using
starcluster. <b>Most of them would stuck in "t state" <u>for
hours</u> and then finally execute (in the r state). </b>For
instance, 50% of the jobs now that are not in qw are in "t
state".<br>
<br>
The same program/script/AMI have been used frequently and this
is the worse ever. The only difference is the jobs this time
are processing bigger files (~6G each, 90 of them) located on
a NFS shared gp2 volume. Jobs were divided into tasks to
ensure that only 4-5 jobs are processing the same file at
once. The memory were not even close to be overloaded (only
used 5G out of 240G each node). The long stuck in "t state" is
wasting money and CPU hours. </div>
<div><br>
</div>
<div>Have any of you seen this issue before? Is there anyway I
can fix / work around this issue? </div>
<div> </div>
<div>Thanks a lot, </div>
<div>Sonia</div>
<div> </div>
<div><br>
</div>
<div>
<div><br>
</div>
-- <br>
<div class="gmail_signature">
<div dir="ltr"><span
style="font-family:arial,sans-serif;font-size:13px;border-collapse:collapse">Ying
S. Ting</span>
<div><span style="border-collapse:collapse"></span><font
face="arial, sans-serif">Ph.D. Candidate, MacCoss Lab</font><br>
<div>
<div>
<div><span
style="font-family:arial,sans-serif;font-size:13px;border-collapse:collapse">Department
of Genome Sciences, University of Washington</span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px;border-collapse:collapse"><br>
</span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
StarCluster mailing list
<a class="moz-txt-link-abbreviated" href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a>
<a class="moz-txt-link-freetext" href="http://mailman.mit.edu/mailman/listinfo/starcluster">http://mailman.mit.edu/mailman/listinfo/starcluster</a>
</pre>
</blockquote>
<br>
</body>
</html>