<p dir="ltr">I find dstat to be very useful when trying to isolate slow downs. It's like an enhanced version of top with many more useful stats. </p>
<p dir="ltr"><a href="http://linux.die.net/man/1/dstat">http://linux.die.net/man/1/dstat</a><br></p>
<p dir="ltr">Combines vmstat, iostat, ifstat, netstat information and more<br>
Shows stats in exactly the same timeframe<br>
Enable/order counters as they make most sense during analysis/troubleshooting</p>
<div class="gmail_quote">On Jul 30, 2015 1:53 PM, "Jacob Barhak" <<a href="mailto:jacob.barhak@gmail.com">jacob.barhak@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">Hi Christopher, </p>
<p dir="ltr">Do you have a lot of I/O? For example writing and reading many files to the same NFS location? </p>
<p dir="ltr">This may explain things. </p>
<p dir="ltr"> Jacob</p>
<div class="gmail_quote">On Jul 30, 2015 2:34 AM, "Christopher Clearfield" <<a href="mailto:chris.clearfield@system-logic.com" target="_blank">chris.clearfield@system-logic.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div>Hi All, <br></div>I'm running a set of about 60K relatively short jobs that take 30 minutes to run. This is through ipython parallel.<br><br></div>Yet my CPU utilization levels are relatively small: <br><br>queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
all.q@master BIP 0/0/2 0.98 linux-x64
---------------------------------------------------------------------------------
all.q@node001 BIP 0/0/8 8.01 linux-x64
---------------------------------------------------------------------------------
all.q@node002 BIP 0/0/8 8.07 linux-x64
---------------------------------------------------------------------------------
all.q@node003 BIP 0/0/8 7.96 linux-x64<br><br></div>(I disabled the ipython engines on master because I was having heartbeat timeout issues with the worker engines on my nodes, which explains why that is so low). <br><br></div>But ~8% utilization on the nodes. Is that expected? <br><br></div>Thanks,<br></div>Chris<br><br></div>
<br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" rel="noreferrer" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div>
<br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" rel="noreferrer" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div>