[StarCluster] Fwd: StarCluster Digest, Vol 44, Issue 5

Sergio Mafra sergiohmafra at gmail.com
Tue May 7 15:30:29 EDT 2013


Hi fellows,

Any help on this.

All the best,

Sergio

---------- Forwarded message ----------
From: Sergio Mafra <sergiohmafra at gmail.com>
Date: Tue, May 7, 2013 at 4:17 PM
Subject: Re: [StarCluster] StarCluster Digest, Vol 44, Issue 5
To: Rajat Banerjee <rajatb at post.harvard.edu>


Hi Rajat,

I think that the date problem was over. Now we´ve got a new one. Check it
out:

ubuntu at domU-12-31-39-02-19-36:~$ starcluster loadbalance spotcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster at mit.edu

>>> Starting load balancer (Use ctrl-c to exit)
Maximum cluster size: 5
Minimum cluster size: 1
Cluster growth rate: 1 nodes/iteration

>>> Loading full job history
*** WARNING - Failed to retrieve stats (1/5):
Traceback (most recent call last):
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 515, in get_stats
    self.stat = self._get_stats()
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 493, in _get_stats
    qacct = '\n'.join(master.ssh.execute(qacct_cmd))
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils/__init__.py",
line 538, in execute
    msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && qacct -j -b
201305071615' failed with status 1:
no jobs running since startup
/opt/sge6/default/common/accounting: No such file or directory
*** WARNING - Retrying in 60s

Just to tell you that I´m running MPICH2. This is part of my config file:

[cluster NewaveUbuntuHVM]
KEYNAME = MasterNode
CLUSTER_SIZE = 5
CLUSTER_USER = sgeadmin
CLUSTER_SHELL = bash
MASTER_IMAGE_ID = ami-7f1d8a16
NODE_IMAGE_ID = ami-411d8a28
NODE_INSTANCE_TYPE = cr1.8xlarge
PLUGINS = mpich2
VOLUMES = newave

All the best,

Sergio


On Mon, May 6, 2013 at 10:42 AM, Sergio Mafra <sergiohmafra at gmail.com>wrote:

> Hi Rajat,
>
> Thanks so much for your help. I´ll do as you said and report the results
> here.
>
> All the best,
>
> Sergio
>
>
> On Sun, May 5, 2013 at 2:48 PM, Rajat Banerjee <rajatb at post.harvard.edu>wrote:
>
>> Hi Sergio,
>> Sorry for the delayed response. Busy week at work. Adding starcluster
>> alias back, in case this helps other people in the future.
>>
>> Like I said, I'm not sure why your instance is coming up with PDT in the
>> EC2 instance, since from what I remember it would always return UTC.
>>
>> Is it possible for you to download the latest dev version if you haven't
>> tried that already?
>>
>> http://star.mit.edu/cluster/docs/latest/contribute.html
>>
>> Slightly different directions than the one you specified. Then, you can
>> modify this file:
>> starcluster/balancers/sge/__init__.py line 466
>> To replace UTC with PDT. Then run the ELB, and it'll run the latest code.
>> Let me know if that works, and we can file a bug and go through the formal
>> process of letting you switch the timezone. I'm guessing that you're pretty
>> familiar with python programming, but if you have more problems then feel
>> free to ask more questions.
>>
>> Best,
>> Rajat
>>
>>
>> On Tue, Apr 30, 2013 at 2:07 PM, Sergio Mafra <sergiohmafra at gmail.com>wrote:
>>
>>> Hi Rajat,
>>>
>>> Thanks so much for your kindness in order to find out where this error
>>> was. Nice!
>>>
>>> It´s a little bit odd to understand what is causing that since I´m using
>>> the SC Controller as an instance in the same zone (us-east-1d) as the
>>> cluster launched by it. So this should be in the same time format...???
>>>
>>> What I did was donwload the code directly from the GIT´s site and
>>> compile it as described in
>>> http://star.mit.edu/cluster/docs/latest/installation.html
>>>
>>> ???
>>>
>>> All the best,
>>>
>>> Sergio
>>>
>>>
>>> On Tue, Apr 30, 2013 at 2:18 PM, Rajat Banerjee <rajatb at post.harvard.edu
>>> > wrote:
>>>
>>>> Sergio,
>>>> I looked at the code that is causing your problems. It's this line:
>>>>
>>>> return datetime.datetime.strptime(str, "%a %b %d %H:%M:%S UTC %Y")
>>>>
>>>> where 'str' is the output of the *remote* call to date. My mac returns
>>>> this:
>>>> rbanerjee:~/starcluster/StarCluster/starcluster/balancers/sge $ date
>>>> Tue Apr 30 13:12:44 EDT 2013
>>>>
>>>> Which looks like it would time format OK. One oddity is that your time
>>>> format is returning PDT when UTC is expected:
>>>>
>>>> ValueError: time data 'Tue Apr 30 07:01:35 PDT 2013' does not match
>>>> format '%a %b %d %H:%M:%S UTC %Y'
>>>>
>>>> Not sure what's causing the problems since it looks mostly right, but
>>>> feel free to tweak the time setting in the code in
>>>> starcluster/balancers/sge/__init__.py line 466 to make the pattern match
>>>> yours. Do you know why your time zone may be set differently than other AWS
>>>> instances we've used? Custom images?
>>>> Best,
>>>> Rajat
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 12:43 PM, <starcluster-request at mit.edu> wrote:
>>>>
>>>>> Send StarCluster mailing list submissions to
>>>>>         starcluster at mit.edu
>>>>>
>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>         http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>> or, via email, send a message with subject or body 'help' to
>>>>>         starcluster-request at mit.edu
>>>>>
>>>>> You can reach the person managing the list at
>>>>>         starcluster-owner at mit.edu
>>>>>
>>>>> When replying, please edit your Subject line so it is more specific
>>>>> than "Re: Contents of StarCluster digest..."
>>>>>
>>>>> Today's Topics:
>>>>>
>>>>>    1. Unable to mount ebs volume (Jerry Lee, GW/US)
>>>>>    2. LoadBalance (Sergio Mafra)
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: "Jerry Lee, GW/US" <Jerry.Lee at genewiz.com>
>>>>> To: <starcluster at mit.edu>
>>>>> Cc:
>>>>> Date: Mon, 15 Apr 2013 17:11:38 -0500
>>>>> Subject: [StarCluster] Unable to mount ebs volume
>>>>>
>>>>> Hi,****
>>>>>
>>>>> ** **
>>>>>
>>>>> I am a beginner of using the StarCluster. I created a ebs volme via
>>>>> Amazon AWS and configure it on the config file to use it for my cluster,
>>>>> but no matter what I do, it doesn't automatically mount the ebs volume onto
>>>>> the cluster. Please help.****
>>>>>
>>>>> ** **
>>>>>
>>>>> [cluster jerrycluster]****
>>>>>
>>>>> EXTENDS = smallcluster****
>>>>>
>>>>> VOLUMES = testdata****
>>>>>
>>>>> ** **
>>>>>
>>>>> [volume testdata]****
>>>>>
>>>>> VOLUME_ID=vol-a0fe24f9****
>>>>>
>>>>> MOUNT_PATH=/data****
>>>>>
>>>>> ** **
>>>>>
>>>>> >>> Waiting for cluster to come up... (updating every 30s)****
>>>>>
>>>>> >>> Waiting for instances to activate...****
>>>>>
>>>>> >>> Waiting for all nodes to be in a 'running' state...****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Waiting for SSH to come up on all nodes...****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Waiting for cluster to come up took 1.847 mins****
>>>>>
>>>>> >>> The master node is ec2-54-234-229-206.compute-1.amazonaws.com****
>>>>>
>>>>> >>> Setting up the cluster...****
>>>>>
>>>>> >>> Configuring hostnames...****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Creating cluster user: None (uid: 1001, gid: 1001)****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Configuring scratch space for user(s): sgeadmin****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Configuring /etc/hosts on each node****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Starting NFS server on master****
>>>>>
>>>>> >>> Configuring NFS exports path(s):****
>>>>>
>>>>> /home****
>>>>>
>>>>> >>> Mounting all NFS export path(s) on 1 worker node(s)****
>>>>>
>>>>> 1/1
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Setting up NFS took 0.073 mins****
>>>>>
>>>>> >>> Configuring passwordless ssh for root****
>>>>>
>>>>> >>> Configuring passwordless ssh for sgeadmin****
>>>>>
>>>>> >>> Shutting down threads...****
>>>>>
>>>>> 20/20
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Configuring SGE...****
>>>>>
>>>>> >>> Configuring NFS exports path(s):****
>>>>>
>>>>> /opt/sge6****
>>>>>
>>>>> >>> Mounting all NFS export path(s) on 1 worker node(s)****
>>>>>
>>>>> 1/1
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Setting up NFS took 0.020 mins****
>>>>>
>>>>> >>> Installing Sun Grid Engine...****
>>>>>
>>>>> 1/1
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Creating SGE parallel environment 'orte'****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Adding parallel environment 'orte' to queue 'all.q'****
>>>>>
>>>>> >>> Shutting down threads...****
>>>>>
>>>>> 20/20
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Configuring cluster took 1.325 mins****
>>>>>
>>>>> >>> Starting cluster took 3.197 mins****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks,****
>>>>>
>>>>> ** **
>>>>>
>>>>> Jerry Lee****
>>>>>
>>>>> ** **
>>>>>
>>>>> Jerry Lee****
>>>>>
>>>>> Assistant Manager of Global Infrastructure****
>>>>>
>>>>> GENEWIZ Inc.****
>>>>>
>>>>> 40 Cragwood Road. Suite 201****
>>>>>
>>>>> South Plainfield, NJ 07080****
>>>>>
>>>>> Phone: 908-222-0711 ext. 3379****
>>>>>
>>>>> Fax: 908-333-4511 ****
>>>>>
>>>>> jerry.lee at genewiz.com****
>>>>>
>>>>> www.genewiz.com****
>>>>>
>>>>>  ****
>>>>>
>>>>> This electronic message, including its attachments, is confidential
>>>>> and proprietary and is solely for the intended recipient.  If you are not
>>>>> the intended recipient, this message was sent to you in error and you are
>>>>> hereby advised that any review, disclosure, copying, distribution or use of
>>>>> this message or any of the information included in this message by you is
>>>>> unauthorized and strictly prohibited.  If you have received this message in
>>>>> error, please immediately notify the sender by reply to this message and
>>>>> permanently delete all copies of this message and its attachments in your
>>>>> possession.  Thank you for your cooperation.****
>>>>>
>>>>> ** **
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Sergio Mafra <sergiohmafra at gmail.com>
>>>>> To: "starcluster at mit.edu" <starcluster at mit.edu>
>>>>> Cc:
>>>>> Date: Tue, 30 Apr 2013 11:05:45 -0300
>>>>> Subject: [StarCluster] LoadBalance
>>>>> Hi fellows,
>>>>>
>>>>> I´m testing StarCluster version 0.999 and so far so good.
>>>>> one thing that isn´t working is loadbalance. This is what I get:
>>>>>
>>>>> ubuntu at domU-12-31-39-02-19-36:~$ starcluster loadbalance newcam
>>>>> StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
>>>>> Software Tools for Academics and Researchers (STAR)
>>>>> Please submit bug reports to starcluster at mit.edu
>>>>>
>>>>> >>> Starting load balancer (Use ctrl-c to exit)
>>>>> Maximum cluster size: 3
>>>>>  Minimum cluster size: 1
>>>>> Cluster growth rate: 1 nodes/iteration
>>>>>
>>>>> *** WARNING - Failed to retrieve stats (1/5):
>>>>> Traceback (most recent call last):
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 515, in get_stats
>>>>>     self.stat = self._get_stats()
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 487, in _get_stats
>>>>>     now = self.get_remote_time()
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 466, in get_remote_time
>>>>>     return datetime.datetime.strptime(str, "%a %b %d %H:%M:%S UTC %Y")
>>>>>   File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
>>>>>     (data_string, format))
>>>>> ValueError: time data 'Tue Apr 30 07:01:35 PDT 2013' does not match
>>>>> format '%a %b %d %H:%M:%S UTC %Y'
>>>>> *** WARNING - Retrying in 60s
>>>>> ^CTraceback (most recent call last):
>>>>>   File "/usr/local/bin/starcluster", line 9, in <module>
>>>>>     load_entry_point('StarCluster==0.9999', 'console_scripts',
>>>>> 'starcluster')()
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cli.py",
>>>>> line 313, in main
>>>>>     StarClusterCLI().main()
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cli.py",
>>>>> line 257, in main
>>>>>     sc.execute(args)
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/commands/loadbalance.py",
>>>>> line 90, in execute
>>>>>     lb.run(cluster)
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 576, in run
>>>>>     self.get_stats()
>>>>>   File "<string>", line 2, in get_stats
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/utils.py",
>>>>> line 92, in wrap_f
>>>>>     res = func(*arg, **kargs)
>>>>>   File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 521, in get_stats
>>>>>     time.sleep(self.polling_interval)
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> All Best,
>>>>>
>>>>> Sergio
>>>>>
>>>>> _______________________________________________
>>>>> StarCluster mailing list
>>>>> StarCluster at mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> StarCluster mailing list
>>>> StarCluster at mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20130507/9f8e6694/attachment-0001.htm


More information about the StarCluster mailing list