<div dir="ltr">Hi!<div><br></div><div>From experience, I don't think it works for qrsh though. </div><div>Justin also just tried it and told me it doesn't work.</div><div><br></div><div><br></div></div><div class="gmail_extra">
<br><br><div class="gmail_quote">2014-03-06 14:26 GMT-05:00 Rayson Ho <span dir="ltr"><<a href="mailto:raysonlogin@gmail.com" target="_blank">raysonlogin@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Mich,<br>
<br>
Thanks for sharing the workaround.<br>
<br>
The behavior is due to a relatively undocumented feature of DRMAA /<br>
Grid Engine -- basically DRAMM jobs in Grid Engine has "-w e" added to<br>
the job submission request. The -w flag takes the following arguments:<br>
<br>
`e' error - jobs with invalid requests will be rejected.<br>
<br>
`w' warning - only a warning will be displayed for invalid requests.<br>
<br>
`n' none - switches off validation; the default for qsub,<br>
qalter, qrsh, qsh and qlogin.<br>
<br>
`p' poke - does not submit the job but prints a validation<br>
report based on a cluster as is with all resource utilizations in<br>
place.<br>
<br>
`v' verify - does not submit the job but prints a<br>
validation report based on an empty cluster.<br>
<br>
<a href="http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html" target="_blank">http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html</a><br>
<br>
Thus with "-w e", if Grid Engine is not happy with the job at<br>
submission time (eg. it thinks that it does not have enough nodes to<br>
run the job), then it will reject the job submission.<br>
<br>
The correct way is to override the DRMAA request with "-w n" or "-w w"<br>
if you are going to use load-balancing.<br>
<br>
Rayson<br>
<br>
==================================================<br>
Open Grid Scheduler - The Official Open Source Grid Engine<br>
<a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
<a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br>
<div><div class="h5"><br>
<br>
On Thu, Mar 6, 2014 at 2:13 PM, François-Michel L'Heureux<br>
<<a href="mailto:fmlheureux@datacratic.com">fmlheureux@datacratic.com</a>> wrote:<br>
> Hi John<br>
><br>
> I assume DRMAA is a replacement to OGS/SGE?<br>
><br>
> About DRMAA bailing out, I don't know the product, but your guess is likely<br>
> correct: I might crash when nodes go away. There is a somewhat similar issue<br>
> with OGS where we need to clean it when nodes go away. It doesn't crash<br>
> though.<br>
><br>
> For your second issue, regarding execution host, again, I had a similar<br>
> issue with OGS. The trick I used is that I left the master node as an<br>
> execution host, but I defined its number of slots to 0. Hence, OGS is happe<br>
> because there is at least an exec host and the load balancer runs just fine<br>
> because when there is only the master node online, there is no slots so it<br>
> immediately adds node whenever jobs come in. I don't know if there is a<br>
> concept of slots in DRMAA or if this version of the loadbalancer uses it but<br>
> if so, I think you could reproduce my trick.<br>
><br>
> I hope it will help you.<br>
><br>
> Mich<br>
><br>
</div></div>> _______________________________________________<br>
> StarCluster mailing list<br>
> <a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
> <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
><br>
</blockquote></div><br></div>