<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div><span></span></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>Hi Greg,</div><div><br></div><div>There is a starcluster plugin for pip install. Also, you may wish to check an anaconda AMI that comes with many libraries and conda on it.&nbsp;</div><div><br></div><div>Here is a link to instructions on setting up an anaconda AMI:</div><div><span style="font-size: 15px; line-height: 19px; white-space: nowrap; -webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); -webkit-text-size-adjust: none; "><a href="http://continuum.io/blog/starcluster-anaconda">http://continuum.io/blog/starcluster-anaconda</a></span></div><div><br></div><div>These two paths should give you enough options to continue.&nbsp;</div><div><br></div><div>&nbsp; &nbsp; &nbsp; &nbsp; Jacob</div><div><br>Sent from my iPhone</div><div><br>On Oct 1, 2014, at 12:48 PM, greg &lt;<a href="mailto:margeemail@gmail.com">margeemail@gmail.com</a>&gt; wrote:<br><br></div><blockquote type="cite"><div><span>Hi everyone,</span><br><span></span><br><span>I found the bug. &nbsp;Apparently I library I installed in Python is only</span><br><span>available on the master node. &nbsp;What's a good way to install Python</span><br><span>libraries so it's available on all nodes? &nbsp;I guess virtualenv, but I'm</span><br><span>hoping for something simpler :-)</span><br><span></span><br><span>-Greg</span><br><span></span><br><span>On Wed, Oct 1, 2014 at 10:33 AM, Jennifer Staab &lt;<a href="mailto:jstaab@cs.unc.edu">jstaab@cs.unc.edu</a>&gt; wrote:</span><br><blockquote type="cite"><span>For software and scripts you can login to each node and check that software</span><br></blockquote><blockquote type="cite"><span>is installed and you can view/find the scripts. Also you might check and</span><br></blockquote><blockquote type="cite"><span>make sure user you qsub'ed the jobs with has correct permissions to run</span><br></blockquote><blockquote type="cite"><span>scripts and software and write output.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Easier way is to use qstat -j &lt;JOBID&gt; where the &lt;JOBID&gt; is jobid of one of</span><br></blockquote><blockquote type="cite"><span>the jobs with &nbsp;EQW status. It and/or the .o/.e files you set when you</span><br></blockquote><blockquote type="cite"><span>submitted the qsub job will give you file location to read the error</span><br></blockquote><blockquote type="cite"><span>messages. &nbsp;If you didn't set an .o and .e file in your qsub call ( using -e</span><br></blockquote><blockquote type="cite"><span>and -o options) I believe it defaults to files with jobid or jobname with</span><br></blockquote><blockquote type="cite"><span>extension .o( for output) and .e ( for error). &nbsp;I believe Chris talked about</span><br></blockquote><blockquote type="cite"><span>this in his reply. &nbsp;This is how I discovered scripts weren't shared is</span><br></blockquote><blockquote type="cite"><span>because the .e file indicated the scripts couldn't be found.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Also as Chris said you can do qconf to change attributes of SGE setup. I</span><br></blockquote><blockquote type="cite"><span>have done this before on a Starcluster cluster - login to master node and as</span><br></blockquote><blockquote type="cite"><span>long as you have admin privileges you can use qconf command to change</span><br></blockquote><blockquote type="cite"><span>attributes of SGE setup.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Good Luck.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>-Jennifer</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Sent from my Verizon Wireless 4G LTE DROID</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>greg &lt;<a href="mailto:margeemail@gmail.com">margeemail@gmail.com</a>&gt; wrote:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Thanks Jennifer! &nbsp;Being completely new to star cluster, how can I</span><br></blockquote><blockquote type="cite"><span>check that my scripts are available to all nodes?</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>-Greg</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>On Wed, Oct 1, 2014 at 8:26 AM, Jennifer Staab &lt;<a href="mailto:jstaab@cs.unc.edu">jstaab@cs.unc.edu</a>&gt; wrote:</span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>Greg -</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span> &nbsp;&nbsp;Maybe check that your software and scripts are available to all nodes.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>I</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>have had Starcluster throw a bunch of EQW's when I accidentally didn't</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>have</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>all the software and script components loaded in a directory that was</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>NFS'ed</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>for all the nodes of my cluster and/or individually loaded on all nodes of</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>the cluster.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>And as Chris just stated use:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>qstat -j &lt;JOBID&gt; ==&gt; gives complete info on that job</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>qstat -j &lt;JOBID&gt; | grep error &nbsp;(looks for errors in job)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>When you get the error debugged you can use:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>qmod -cj &lt;JOBID&gt; &nbsp;(will clear error state and restart job - like Eqw )</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Good Luck.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>-Jennifer</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>On 10/1/14 8:09 AM, Chris Dagdigian wrote:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>'EQW' is a combination of multiple message states (e)(q)(w). &nbsp;The</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>standard "qw" is familiar to everyone, the E indicates something bad at</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>the job level.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>There are multiple levels of debugging, starting with easy and getting</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>more cumbersome. Almost all require admin or sudo level access</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>The 1st pass debug method is to run "qstat -j &lt;jobID&gt;" on the job that</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>is in EQW state, that should provide a bit more information about what</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>went wrong.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>After that you look at the .e and .o STDERR/STDOUT files from the script</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>if any were created</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>After that you can use sudo privs to go into</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>$SGE_ROOT/$SGE_CELL/spool/qmaster/ and look at the messages file, there</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>are also per-node messages files you can look at as well.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>The next level of debugging after that usually involves setting the</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>sge_execd parameter KEEP_ACTIVE=true which triggers a behavior where SGE</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>will stop deleting the temporary files associated with a job life cycle.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Those files live down in the SGE spool at location</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>&lt;executionhost&gt;/active.jobs/&lt;jobID/ &nbsp;-- and they are invaluable in</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>debugging nasty subtle job failures</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>EQW should be easy to troubleshoot though - it indicates a fatal error</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>right at the beginning of the job dispatch or execution process. No</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>subtle things there</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>And if your other question was about nodes being allowed to submit jobs</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>-- yes you have to configure this. It can be done during SGE install</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>time or any time afterwards by doing "qconf -as &lt;nodename&gt;" from any</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>account with SGE admin privs. I have no idea if startcluster does this</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>automatically or not but I'd expect that it probably does, If not it's</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>an easy fix.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>-Chris</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>greg wrote:</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Hi guys,</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>I'm afraid I'm still stuck on this. &nbsp;Besides my original question</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>which I'm still not sure about. &nbsp;Does anyone have any general advice</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>on debugging an EQW state? &nbsp;The same software runs fine in our local</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>cluster.</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>thanks again,</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Greg</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>_______________________________________________</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>StarCluster mailing list</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="http://mailman.mit.edu/mailman/listinfo/starcluster">http://mailman.mit.edu/mailman/listinfo/starcluster</a></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><span>_______________________________________________</span><br><span>StarCluster mailing list</span><br><span><a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a></span><br><span><a href="http://mailman.mit.edu/mailman/listinfo/starcluster">http://mailman.mit.edu/mailman/listinfo/starcluster</a></span><br></div></blockquote></div></body></html>