<div dir="ltr">Hi!<div><br></div><div>From experience, I don&#39;t think it works for qrsh though. </div><div>Justin also just tried it and told me it doesn&#39;t work.</div><div><br></div><div><br></div></div><div class="gmail_extra">


<br><br><div class="gmail_quote">2014-03-06 14:26 GMT-05:00 Rayson Ho <span dir="ltr">&lt;<a href="mailto:raysonlogin@gmail.com" target="_blank">raysonlogin@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi Mich,<br>

<br>

Thanks for sharing the workaround.<br>

<br>

The behavior is due to a relatively undocumented feature of DRMAA /<br>

Grid Engine -- basically DRAMM jobs in Grid Engine has &quot;-w e&quot; added to<br>

the job submission request. The -w flag takes the following arguments:<br>

<br>

          `e&#39;  error - jobs with invalid requests will be rejected.<br>

<br>

          `w&#39;  warning - only a warning will be displayed for invalid requests.<br>

<br>

          `n&#39;  none - switches off validation; the default for qsub,<br>

qalter, qrsh, qsh and qlogin.<br>

<br>

          `p&#39;  poke - does not submit the job but prints a validation<br>

report based on a cluster as is with all resource utilizations in<br>

place.<br>

<br>

          `v&#39;  verify - does not submit the job but prints a<br>

validation report based on an empty cluster.<br>

<br>

<a href="http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html" target="_blank">http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html</a><br>

<br>

Thus with &quot;-w e&quot;, if Grid Engine is not happy with the job at<br>

submission time (eg. it thinks that it does not have enough nodes to<br>

run the job), then it will reject the job submission.<br>

<br>

The correct way is to override the DRMAA request with &quot;-w n&quot; or &quot;-w w&quot;<br>

if you are going to use load-balancing.<br>

<br>

Rayson<br>

<br>

==================================================<br>

Open Grid Scheduler - The Official Open Source Grid Engine<br>

<a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>

<a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br>

<div><div class="h5"><br>

<br>

On Thu, Mar 6, 2014 at 2:13 PM, François-Michel L&#39;Heureux<br>

&lt;<a href="mailto:fmlheureux@datacratic.com">fmlheureux@datacratic.com</a>&gt; wrote:<br>

&gt; Hi John<br>

&gt;<br>

&gt; I assume DRMAA is a replacement to OGS/SGE?<br>

&gt;<br>

&gt; About DRMAA bailing out, I don&#39;t know the product, but your guess is likely<br>

&gt; correct: I might crash when nodes go away. There is a somewhat similar issue<br>

&gt; with OGS where we need to clean it when nodes go away. It doesn&#39;t crash<br>

&gt; though.<br>

&gt;<br>

&gt; For your second issue, regarding execution host, again, I had a similar<br>

&gt; issue with OGS. The trick I used is that I left the master node as an<br>

&gt; execution host, but I defined its number of slots to 0. Hence, OGS is happe<br>

&gt; because there is at least an exec host and the load balancer runs just fine<br>

&gt; because when there is only the master node online, there is no slots so it<br>

&gt; immediately adds node whenever jobs come in. I don&#39;t know if there is a<br>

&gt; concept of slots in DRMAA or if this version of the loadbalancer uses it but<br>

&gt; if so, I think you could reproduce my trick.<br>

&gt;<br>

&gt; I hope it will help you.<br>

&gt;<br>

&gt; Mich<br>

&gt;<br>

</div></div>&gt; _______________________________________________<br>

&gt; StarCluster mailing list<br>

&gt; <a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>

&gt; <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

&gt;<br>

</blockquote></div><br></div>