<div><div>For software and scripts you can login to each node and check that software is installed and you can view/find the scripts. Also you might check and make sure user you qsub&#39;ed the jobs with has correct permissions to run scripts and software and write output.</div><div><br/></div><div>Easier way is to use qstat -j &lt;JOBID&gt; where the &lt;JOBID&gt; is jobid of one of the jobs with&#160; EQW status. It and/or the .o/.e files you set when you submitted the qsub job will give you file location to read the error messages.&#160; If you didn&#39;t set an .o and .e file in your qsub call ( using -e and -o options) I believe it defaults to files with jobid or jobname with extension .o( for output) and .e ( for error).&#160; I believe Chris talked about this in his reply.&#160; This is how I discovered scripts weren&#39;t shared is because the .e file indicated the scripts couldn&#39;t be found.</div><div><br/></div><div>Also as Chris said you can do qconf to change attributes of SGE setup. I have done this before on a Starcluster cluster - login to master node and as long as you have admin privileges you can use qconf command to change attributes of SGE setup.</div><div><br/></div><div>Good Luck.</div><div><br/></div><div>-Jennifer</div><div><br/></div><div><i><font style="color:#333333">Sent from my Verizon Wireless 4G LTE DROID</font></i></div></div><br><br>greg &lt;margeemail@gmail.com&gt; wrote:<br><br>Thanks Jennifer!&nbsp; Being completely new to star cluster, how can I<br>check that my scripts are available to all nodes?<br><br>-Greg<br><br>On Wed, Oct 1, 2014 at 8:26 AM, Jennifer Staab &lt;jstaab@cs.unc.edu&gt; wrote:<br>&gt; Greg -<br>&gt;<br>&gt;&nbsp;&nbsp;&nbsp; Maybe check that your software and scripts are available to all nodes.&nbsp; I<br>&gt; have had Starcluster throw a bunch of EQW's when I accidentally didn't have<br>&gt; all the software and script components loaded in a directory that was NFS'ed<br>&gt; for all the nodes of my cluster and/or individually loaded on all nodes of<br>&gt; the cluster.<br>&gt;<br>&gt; And as Chris just stated use:<br>&gt; qstat -j &lt;JOBID&gt; ==&gt; gives complete info on that job<br>&gt; qstat -j &lt;JOBID&gt; | grep error&nbsp; (looks for errors in job)<br>&gt;<br>&gt; When you get the error debugged you can use:<br>&gt; qmod -cj &lt;JOBID&gt;&nbsp; (will clear error state and restart job - like Eqw )<br>&gt;<br>&gt; Good Luck.<br>&gt;<br>&gt; -Jennifer<br>&gt;<br>&gt;<br>&gt;<br>&gt; On 10/1/14 8:09 AM, Chris Dagdigian wrote:<br>&gt;&gt;<br>&gt;&gt; 'EQW' is a combination of multiple message states (e)(q)(w).&nbsp; The<br>&gt;&gt; standard "qw" is familiar to everyone, the E indicates something bad at<br>&gt;&gt; the job level.<br>&gt;&gt;<br>&gt;&gt; There are multiple levels of debugging, starting with easy and getting<br>&gt;&gt; more cumbersome. Almost all require admin or sudo level access<br>&gt;&gt;<br>&gt;&gt; The 1st pass debug method is to run "qstat -j &lt;jobID&gt;" on the job that<br>&gt;&gt; is in EQW state, that should provide a bit more information about what<br>&gt;&gt; went wrong.<br>&gt;&gt;<br>&gt;&gt; After that you look at the .e and .o STDERR/STDOUT files from the script<br>&gt;&gt; if any were created<br>&gt;&gt;<br>&gt;&gt; After that you can use sudo privs to go into<br>&gt;&gt; $SGE_ROOT/$SGE_CELL/spool/qmaster/ and look at the messages file, there<br>&gt;&gt; are also per-node messages files you can look at as well.<br>&gt;&gt;<br>&gt;&gt; The next level of debugging after that usually involves setting the<br>&gt;&gt; sge_execd parameter KEEP_ACTIVE=true which triggers a behavior where SGE<br>&gt;&gt; will stop deleting the temporary files associated with a job life cycle.<br>&gt;&gt; Those files live down in the SGE spool at location<br>&gt;&gt; &lt;executionhost&gt;/active.jobs/&lt;jobID/&nbsp; -- and they are invaluable in<br>&gt;&gt; debugging nasty subtle job failures<br>&gt;&gt;<br>&gt;&gt; EQW should be easy to troubleshoot though - it indicates a fatal error<br>&gt;&gt; right at the beginning of the job dispatch or execution process. No<br>&gt;&gt; subtle things there<br>&gt;&gt;<br>&gt;&gt;<br>&gt;&gt; And if your other question was about nodes being allowed to submit jobs<br>&gt;&gt; -- yes you have to configure this. It can be done during SGE install<br>&gt;&gt; time or any time afterwards by doing "qconf -as &lt;nodename&gt;" from any<br>&gt;&gt; account with SGE admin privs. I have no idea if startcluster does this<br>&gt;&gt; automatically or not but I'd expect that it probably does, If not it's<br>&gt;&gt; an easy fix.<br>&gt;&gt;<br>&gt;&gt; -Chris<br>&gt;&gt;<br>&gt;&gt;<br>&gt;&gt; greg wrote:<br>&gt;&gt;&gt;<br>&gt;&gt;&gt; Hi guys,<br>&gt;&gt;&gt;<br>&gt;&gt;&gt; I'm afraid I'm still stuck on this.&nbsp; Besides my original question<br>&gt;&gt;&gt; which I'm still not sure about.&nbsp; Does anyone have any general advice<br>&gt;&gt;&gt; on debugging an EQW state?&nbsp; The same software runs fine in our local<br>&gt;&gt;&gt; cluster.<br>&gt;&gt;&gt;<br>&gt;&gt;&gt; thanks again,<br>&gt;&gt;&gt;<br>&gt;&gt;&gt; Greg<br>&gt;&gt;<br>&gt;&gt; _______________________________________________<br>&gt;&gt; StarCluster mailing list<br>&gt;&gt; StarCluster@mit.edu<br>&gt;&gt; http://mailman.mit.edu/mailman/listinfo/starcluster<br>&gt;<br>&gt;<br>