See comments inline-<br><br><div class="gmail_quote">On Fri, Mar 25, 2011 at 11:40 AM, Kyeong Soo (Joseph) Kim <span dir="ltr"><<a href="mailto:kyeongsoo.kim@gmail.com">kyeongsoo.kim@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
For instance, the implementation of load<br>
balancing would be much simpler and better and, if needed, it can<br>
completely terminate the whole instances.<br>
<br>
As for my own experience with 25-node clusters, I found out that the<br>
load balancer did not terminate the master node, even though it<br>
finished all assigned jobs; the master node is a single point of<br>
contact and had to wait for all those jobs running in other nodes to<br>
finish.<br><br></blockquote><div><br></div><div>There is a variable in starcluster/balancers/sge/__init__.py</div><div>called:</div><div>#This would allow the master to be killed when the queue empties. UNTESTED.</div><div>
allow_master_kill = False </div><div><br></div><div>That would kill the master once the job queue is empty. You can turn it to True and test it if you'd like.</div><div><br></div><div>This raises some risks - when the master is killed, the cluster is no longer accessible, and your results may be lost (unless you were smart enough to put them on ebs). I kept it semi-hidden because of these risks. Since you're obviously interested, give it a try. I used it for a little while, and it was able to terminate the master node when the jobs were finished. Though the cluster tags, groups, etc still exist, they won't incur any charges. at some later date you'd still have to call 'starcluster stop <cluster_tag>.</div>
<div><br></div><div><br></div><div>Best,</div><div>Rajat</div><div><br></div></div>