<div dir="ltr">Hi Rajat,<div><br></div><div style>Yes. It´s OK. </div><div style>In one of my tests, the cluster has been shutdown still working. </div><div style>The main diff is that I use MPICH2 instead of openMPI. I don´t know if this change anything.</div>
<div style>But I will run some new tesst and return the results to you all.</div><div style><br></div><div style>All the best,</div><div style><br></div><div style>Sergio</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Thu, May 23, 2013 at 1:47 PM, Rajat Banerjee <span dir="ltr"><<a href="mailto:rqbanerjee@gmail.com" target="_blank">rqbanerjee@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<p dir="ltr">Hi Sergio, <br>
The reason that a node has not been removed is documented here in the ELB document :</p>
<p dir="ltr"><a href="http://star.mit.edu/cluster/docs/0.93.3/manual/load_balancer.html" target="_blank">http://star.mit.edu/cluster/docs/0.93.3/manual/load_balancer.html</a></p>
<p dir="ltr">See the "criteria for removing a node section" </p>
<p dir="ltr">Does that make sense? </p><div class="HOEnZb"><div class="h5">
<div class="gmail_quote">On May 23, 2013 12:40 PM, "MacMullan, Hugh" <<a href="mailto:hughmac@wharton.upenn.edu" target="_blank">hughmac@wharton.upenn.edu</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Hi Sergio:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Those jobs aren’t queued, they’re running.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Don’t know why nodes aren’t being removed though …. has it been 55 minutes (I think that’s the default)?<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Cheers, Hugh<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Sergio Mafra [mailto:<a href="mailto:sergiohmafra@gmail.com" target="_blank">sergiohmafra@gmail.com</a>]
<br>
<b>Sent:</b> Wednesday, May 22, 2013 10:43 AM<br>
<b>To:</b> MacMullan, Hugh<br>
<b>Cc:</b> <a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a><br>
<b>Subject:</b> Re: [StarCluster] Fwd: StarCluster Digest, Vol 44, Issue 5<u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">Hi all,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Just followed the tip from Hugh MacMullan (submit a ghost job to restart the stats to begin loadbalancing starcluster but still got some strange info.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">At this present time, the queue has 3 jobs:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">sgeadmin@master:~/gpo2/PEN_2013/GTBase/caso4$ qstat<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">job-ID prior name user state submit/start at queue slots ja-task-ID<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">-----------------------------------------------------------------------------------------------------------------<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> 2 0.50500 c4 sgeadmin r 05/22/2013 14:30:08
<a href="mailto:all.q@node001" target="_blank">all.q@node001</a> 53<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> 3 0.50500 c5 sgeadmin r 05/22/2013 14:30:23
<a href="mailto:all.q@node001" target="_blank">all.q@node001</a> 53<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> 4 0.60500 c6 sgeadmin r 05/22/2013 14:30:38
<a href="mailto:all.q@node001" target="_blank">all.q@node001</a> 54<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">But StarCluster LoadBalancer doesn´t understant that... It says Queued Jobs: 0<u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">>>> Loading full job history<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Execution hosts: 5<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Queued jobs: 0<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Avg job duration: 0 secs<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Avg job wait time: 1 secs<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Last cluster modification time: 2013-05-22 14:33:34<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">>>> Not adding nodes: already at or above maximum (5)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">>>> Looking for nodes to remove...<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">>>> No nodes can be removed at this time<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">>>> Sleeping...(looping again in 60 secs<u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Tue, May 7, 2013 at 4:37 PM, MacMullan, Hugh <<a href="mailto:hughmac@wharton.upenn.edu" target="_blank">hughmac@wharton.upenn.edu</a>> wrote:<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Sergio:</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">One job needs to run to completion before the stats are available to begin loadbalancing in StarCluster.
Like:</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">echo hostname | qsub -o /dev/null -j y</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">It’s funny, I just ran into those same two issues (timezone and no completed job) recently while
fiddling with my own AMIs and loadbalancing.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Cheers, Hugh</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">
<a href="mailto:starcluster-bounces@mit.edu" target="_blank">starcluster-bounces@mit.edu</a> [mailto:<a href="mailto:starcluster-bounces@mit.edu" target="_blank">starcluster-bounces@mit.edu</a>]
<b>On Behalf Of </b>Sergio Mafra<br>
<b>Sent:</b> Tuesday, May 07, 2013 3:30 PM<br>
<b>To:</b> <a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a><br>
<b>Subject:</b> [StarCluster] Fwd: StarCluster Digest, Vol 44, Issue 5</span><u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<p class="MsoNormal">Hi fellows,<u></u><u></u></p>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Any help on this. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">All the best,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Sergio<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">---------- Forwarded message ----------<br>
From: <b>Sergio Mafra</b> <<a href="mailto:sergiohmafra@gmail.com" target="_blank">sergiohmafra@gmail.com</a>><br>
Date: Tue, May 7, 2013 at 4:17 PM<br>
Subject: Re: [StarCluster] StarCluster Digest, Vol 44, Issue 5<br>
To: Rajat Banerjee <<a href="mailto:rajatb@post.harvard.edu" target="_blank">rajatb@post.harvard.edu</a>><u></u><u></u></p>
<div>
<p class="MsoNormal">Hi Rajat,<u></u><u></u></p>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">I think that the date problem was over. Now we´ve got a new one. Check it out:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">ubuntu@domU-12-31-39-02-19-36:~$ starcluster loadbalance spotcluster<u></u><u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">StarCluster - (<a href="http://star.mit.edu/cluster" target="_blank">http://star.mit.edu/cluster</a>) (v. 0.9999)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Software Tools for Academics and Researchers (STAR)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Please submit bug reports to
<a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">>>> Starting load balancer (Use ctrl-c to exit)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Maximum cluster size: 5<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Minimum cluster size: 1<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Cluster growth rate: 1 nodes/iteration<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal">>>> Loading full job history<u></u><u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">*** WARNING - Failed to retrieve stats (1/5):<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Traceback (most recent call last):<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py", line 515, in get_stats<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> self.stat = self._get_stats()<u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py", line 493, in _get_stats<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> qacct = '\n'.join(master.ssh.execute(qacct_cmd))<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils/__init__.py", line 538, in execute<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> msg, command, exit_status, out_str)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">RemoteCommandFailed: remote command 'source /etc/profile && qacct -j -b 201305071615' failed with status 1:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">no jobs running since startup<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">/opt/sge6/default/common/accounting: No such file or directory<u></u><u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">*** WARNING - Retrying in 60s<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal">Just to tell you that I´m running MPICH2. This is part of my config file:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">[cluster NewaveUbuntuHVM]<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">KEYNAME = MasterNode<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">CLUSTER_SIZE = 5<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">CLUSTER_USER = sgeadmin<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">CLUSTER_SHELL = bash<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">MASTER_IMAGE_ID = ami-7f1d8a16<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">NODE_IMAGE_ID = ami-411d8a28<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">NODE_INSTANCE_TYPE = cr1.8xlarge<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">PLUGINS = mpich2<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">VOLUMES = newave<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">All the best,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Sergio<u></u><u></u></p>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Mon, May 6, 2013 at 10:42 AM, Sergio Mafra <<a href="mailto:sergiohmafra@gmail.com" target="_blank">sergiohmafra@gmail.com</a>> wrote:<u></u><u></u></p>
<div>
<p class="MsoNormal">Hi Rajat,<u></u><u></u></p>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Thanks so much for your help. I´ll do as you said and report the results here.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">All the best,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><br>
Sergio<u></u><u></u></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Sun, May 5, 2013 at 2:48 PM, Rajat Banerjee <<a href="mailto:rajatb@post.harvard.edu" target="_blank">rajatb@post.harvard.edu</a>> wrote:<u></u><u></u></p>
<div>
<p class="MsoNormal">Hi Sergio, <u></u><u></u></p>
<div>
<p class="MsoNormal">Sorry for the delayed response. Busy week at work. Adding starcluster alias back, in case this helps other people in the future.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Like I said, I'm not sure why your instance is coming up with PDT in the EC2 instance, since from what I remember it would always return UTC.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Is it possible for you to download the latest dev version if you haven't tried that already?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><a href="http://star.mit.edu/cluster/docs/latest/contribute.html" target="_blank">http://star.mit.edu/cluster/docs/latest/contribute.html</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Slightly different directions than the one you specified. Then, you can modify this file:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">starcluster/balancers/sge/__init__.py line 466</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">To replace UTC with PDT. Then run the ELB, and it'll run the latest code. Let me know if that works, and we can file
a bug and go through the formal process of letting you switch the timezone. I'm guessing that you're pretty familiar with python programming, but if you have more problems then feel free to ask more questions.</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Best,</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Rajat</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Tue, Apr 30, 2013 at 2:07 PM, Sergio Mafra <<a href="mailto:sergiohmafra@gmail.com" target="_blank">sergiohmafra@gmail.com</a>> wrote:<u></u><u></u></p>
<div>
<p class="MsoNormal">Hi Rajat,<u></u><u></u></p>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Thanks so much for your kindness in order to find out where this error was. Nice!<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">It´s a little bit odd to understand what is causing that since I´m using the SC Controller as an instance in the same zone (us-east-1d) as the cluster launched by it. So this should
be in the same time format...???<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">What I did was donwload the code directly from the GIT´s site and compile it as described in <a href="http://star.mit.edu/cluster/docs/latest/installation.html" target="_blank">http://star.mit.edu/cluster/docs/latest/installation.html</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">???<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">All the best,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Sergio<u></u><u></u></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Tue, Apr 30, 2013 at 2:18 PM, Rajat Banerjee <<a href="mailto:rajatb@post.harvard.edu" target="_blank">rajatb@post.harvard.edu</a>> wrote:<u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">Sergio,<br>
I looked at the code that is causing your problems. It's this line:<u></u><u></u></p>
<div>
<p class="MsoNormal">return datetime.datetime.strptime(str, "%a %b %d %H:%M:%S UTC %Y")<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">where 'str' is the output of the *remote* call to date. My mac returns this:<br>
rbanerjee:~/starcluster/StarCluster/starcluster/balancers/sge $ date<br>
Tue Apr 30 13:12:44 EDT 2013<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Which looks like it would time format OK. One oddity is that your time format is returning PDT when UTC is expected:<br>
<br>
ValueError: time data 'Tue Apr 30 07:01:35 PDT 2013' does not match format '%a %b %d %H:%M:%S UTC %Y'<br>
<br>
Not sure what's causing the problems since it looks mostly right, but feel free to tweak the time setting in the code in starcluster/balancers/sge/__init__.py line 466 to make the pattern match yours. Do you know why your time zone may be set differently than
other AWS instances we've used? Custom images?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">Best,<br>
Rajat<u></u><u></u></p>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Tue, Apr 30, 2013 at 12:43 PM, <<a href="mailto:starcluster-request@mit.edu" target="_blank">starcluster-request@mit.edu</a>> wrote:<u></u><u></u></p>
<p class="MsoNormal">Send StarCluster mailing list submissions to<br>
<a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">
http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:starcluster-request@mit.edu" target="_blank">starcluster-request@mit.edu</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:starcluster-owner@mit.edu" target="_blank">starcluster-owner@mit.edu</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of StarCluster digest..."<br>
<br>
Today's Topics:<br>
<br>
1. Unable to mount ebs volume (Jerry Lee, GW/US)<br>
2. LoadBalance (Sergio Mafra)<br>
<br>
<br>
---------- Forwarded message ----------<br>
From: "Jerry Lee, GW/US" <<a href="mailto:Jerry.Lee@genewiz.com" target="_blank">Jerry.Lee@genewiz.com</a>><br>
To: <<a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a>><br>
Cc: <br>
Date: Mon, 15 Apr 2013 17:11:38 -0500<br>
Subject: [StarCluster] Unable to mount ebs volume<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal">Hi,<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">I am a beginner of using the StarCluster. I created a ebs volme via Amazon AWS and configure it on the config file to use it for my cluster, but no matter what I do, it doesn't
automatically mount the ebs volume onto the cluster. Please help.<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">[cluster jerrycluster]<u></u><u></u></p>
<p class="MsoNormal">EXTENDS = smallcluster<u></u><u></u></p>
<p class="MsoNormal">VOLUMES = testdata<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">[volume testdata]<u></u><u></u></p>
<p class="MsoNormal">VOLUME_ID=vol-a0fe24f9<u></u><u></u></p>
<p class="MsoNormal">MOUNT_PATH=/data<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">>>> Waiting for cluster to come up... (updating every 30s)<u></u><u></u></p>
<p class="MsoNormal">>>> Waiting for instances to activate...<u></u><u></u></p>
<p class="MsoNormal">>>> Waiting for all nodes to be in a 'running' state...<u></u><u></u></p>
<p class="MsoNormal">2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Waiting for SSH to come up on all nodes...<u></u><u></u></p>
<p class="MsoNormal">2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Waiting for cluster to come up took 1.847 mins<u></u><u></u></p>
<p class="MsoNormal">>>> The master node is
<a href="http://ec2-54-234-229-206.compute-1.amazonaws.com" target="_blank">ec2-54-234-229-206.compute-1.amazonaws.com</a><u></u><u></u></p>
<p class="MsoNormal">>>> Setting up the cluster...<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring hostnames...<u></u><u></u></p>
<p class="MsoNormal">2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Creating cluster user: None (uid: 1001, gid: 1001)<u></u><u></u></p>
<p class="MsoNormal">2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring scratch space for user(s): sgeadmin<u></u><u></u></p>
<p class="MsoNormal">2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring /etc/hosts on each node<u></u><u></u></p>
<p class="MsoNormal">2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Starting NFS server on master<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring NFS exports path(s):<u></u><u></u></p>
<p class="MsoNormal">/home<u></u><u></u></p>
<p class="MsoNormal">>>> Mounting all NFS export path(s) on 1 worker node(s)<u></u><u></u></p>
<p class="MsoNormal">1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Setting up NFS took 0.073 mins<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring passwordless ssh for root<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring passwordless ssh for sgeadmin<u></u><u></u></p>
<p class="MsoNormal">>>> Shutting down threads...<u></u><u></u></p>
<p class="MsoNormal">20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring SGE...<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring NFS exports path(s):<u></u><u></u></p>
<p class="MsoNormal">/opt/sge6<u></u><u></u></p>
<p class="MsoNormal">>>> Mounting all NFS export path(s) on 1 worker node(s)<u></u><u></u></p>
<p class="MsoNormal">1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Setting up NFS took 0.020 mins<u></u><u></u></p>
<p class="MsoNormal">>>> Installing Sun Grid Engine...<u></u><u></u></p>
<p class="MsoNormal">1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Creating SGE parallel environment 'orte'<u></u><u></u></p>
<p class="MsoNormal">2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Adding parallel environment 'orte' to queue 'all.q'<u></u><u></u></p>
<p class="MsoNormal">>>> Shutting down threads...<u></u><u></u></p>
<p class="MsoNormal">20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<u></u><u></u></p>
<p class="MsoNormal">>>> Configuring cluster took 1.325 mins<u></u><u></u></p>
<p class="MsoNormal">>>> Starting cluster took 3.197 mins<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Thanks,<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:20.0pt;font-family:"Vladimir Script"">Jerry Lee</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Jerry Lee</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Assistant Manager of Global Infrastructure</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">GENEWIZ Inc.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">40 Cragwood Road. Suite 201</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">South Plainfield, NJ 07080</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Phone:
<a href="tel:908-222-0711%20ext.%203379" target="_blank">908-222-0711 ext. 3379</a></span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Fax:
<a href="tel:908-333-4511" target="_blank">908-333-4511</a> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""><a href="mailto:jerry.lee@genewiz.com" target="_blank">jerry.lee@genewiz.com</a></span><u></u><u></u></p>
<p class="MsoNormal"><a href="http://www.genewiz.com/" title="blocked::http://www.genewiz.com/" target="_blank"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">www.genewiz.com</span></a><u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Tahoma","sans-serif"">This electronic message, including its attachments, is confidential and proprietary and is solely for the intended
recipient. If you are not the intended recipient, this message was sent to you in error and you are hereby advised that any review, disclosure, copying, distribution or use of this message or any of the information included in this message by you is unauthorized
and strictly prohibited. If you have received this message in error, please immediately notify the sender by reply to this message and permanently delete all copies of this message and its attachments in your possession. Thank you for your cooperation.</span><u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
<p class="MsoNormal"><br>
<br>
---------- Forwarded message ----------<br>
From: Sergio Mafra <<a href="mailto:sergiohmafra@gmail.com" target="_blank">sergiohmafra@gmail.com</a>><br>
To: "<a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a>" <<a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a>><br>
Cc: <br>
Date: Tue, 30 Apr 2013 11:05:45 -0300<br>
Subject: [StarCluster] LoadBalance<u></u><u></u></p>
<div>
<p class="MsoNormal">Hi fellows,<u></u><u></u></p>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">I´m testing StarCluster version 0.999 and so far so good.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">one thing that isn´t working is loadbalance. This is what I get:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">ubuntu@domU-12-31-39-02-19-36:~$ starcluster loadbalance newcam<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">StarCluster - (<a href="http://star.mit.edu/cluster" target="_blank">http://star.mit.edu/cluster</a>) (v. 0.9999)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Software Tools for Academics and Researchers (STAR)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Please submit bug reports to
<a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">>>> Starting load balancer (Use ctrl-c to exit)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Maximum cluster size: 3<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Minimum cluster size: 1<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Cluster growth rate: 1 nodes/iteration<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">*** WARNING - Failed to retrieve stats (1/5):<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Traceback (most recent call last):<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py", line 515, in get_stats<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> self.stat = self._get_stats()<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py", line 487, in _get_stats<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> now = self.get_remote_time()<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py", line 466, in get_remote_time<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> return datetime.datetime.strptime(str, "%a %b %d %H:%M:%S UTC %Y")<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> (data_string, format))<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">ValueError: time data 'Tue Apr 30 07:01:35 PDT 2013' does not match format '%a %b %d %H:%M:%S UTC %Y'<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">*** WARNING - Retrying in 60s<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">^CTraceback (most recent call last):<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/bin/starcluster", line 9, in <module><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> load_entry_point('StarCluster==0.9999', 'console_scripts', 'starcluster')()<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cli.py", line 313, in main<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> StarClusterCLI().main()<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cli.py", line 257, in main<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> sc.execute(args)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/commands/loadbalance.py", line 90, in execute<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> lb.run(cluster)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py", line 576, in run<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> self.get_stats()<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "<string>", line 2, in get_stats<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/utils.py", line 92, in wrap_f<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> res = func(*arg, **kargs)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py", line 521, in get_stats<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> time.sleep(self.polling_interval)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Any ideas?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">All Best,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Sergio<u></u><u></u></p>
</div>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><u></u><u></u></p>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><u></u><u></u></p>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
<br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div>
</div></div></blockquote></div><br></div>