[Starcluster] error when starting cluster

Justin Riley jtriley at MIT.EDU
Tue Apr 20 18:02:47 EDT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Damian,

> It worked, thanks very much for the prompt fix.

Excellent, glad to hear that.

> Tell me if you think this will work.

Yep that should work although I don't believe you'll need to reboot the
instances or even detach the volumes but it shouldn't hurt. The big
thing is to make sure you have cluster_size consistent with how many
running nodes are in the cluster's security group. So, you might need to
do the following assuming your cluster template (mycluster) has
cluster_size=8 and that there are actually 2 running instances:

$ starcluster start -x --cluster-size 2 mycluster dtest

Hope that helps,

~Justin



On 04/20/2010 05:44 PM, Damian Eads wrote:
> Hi Justin,
> 
> It worked, thanks very much for the prompt fix. Before I received your
> e-mail, I killed 6 of my 8 octcore instances to save money. Tell me if
> you think this will work.
> 
>    1. Through the AWS web console, detach currently used volumes.
>    2. Manually reboot the instances currently running.
>    3. Manually launch additional spot instances in the same
> availability group as the ones currently running.
>    4. Rerun starcluster start -x mycluster dtest
> 
> Being able to restart the cluster without first terminating the
> instances and then relaunching them will save money. Do you think this
> will work? I don't mind doing it manually.
> 
> Thanks a lot in advance!
> 
> Damian
> 
> 
> On Tue, Apr 20, 2010 at 2:16 PM, Justin Riley <jtriley at mit.edu> wrote:
> Hi Damian,
> 
> I believe I've fixed this in github. Could you pull and give it another
> shot?
> 
> Also, I've added support for master/node001/etc aliases to the sshnode
> action. So, you should now be able to:
> 
> $ starcluster sshnode mycluster master
> $ starcluster sshnode mycluster node001
> etc
> 
> Please let me know if the latest github code fixes your problem below
> and if you have any other issues.
> 
> Thanks,
> 
> ~Justin
> 
> On 04/20/2010 04:38 PM, Damian Eads wrote:
>>>> Hi Justin,
>>>>
>>>> I just did a git pull and got the following error when I tried
>>>> creating my cluster. Ideas?
>>>>
>>>> Thanks,
>>>>
>>>> Damian
>>>>
>>>> eads at street:~/work/repo/StarCluster$ starcluster start -x mycluster dtest
>>>> /tmp/qqq/lib/python2.6/site-packages/pycrypto-2.0.1-py2.6-linux-x86_64.egg/Crypto/Hash/SHA.py:6:
>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>>>> module instead
>>>> /tmp/qqq/lib/python2.6/site-packages/pycrypto-2.0.1-py2.6-linux-x86_64.egg/Crypto/Hash/MD5.py:6:
>>>> DeprecationWarning: the md5 module is deprecated; use hashlib instead
>>>> /var/lib/python-support/python2.6/IPython/Magic.py:38:
>>>> DeprecationWarning: the sets module is deprecated
>>>>   from sets import Set
>>>> StarCluster - (http://web.mit.edu/starcluster)
>>>> Software Tools for Academics and Researchers (STAR)
>>>> Please submit bug reports to starcluster at mit.edu
>>>>
>>>>>>> Validating cluster settings...
>>>>>>> Cluster settings are valid
>>>>>>> Starting cluster...
>>>>>>> Waiting for cluster to start...
>>>>>>> The master node is ec2-174-129-172-124.compute-1.amazonaws.com
>>>>>>> Attaching volume vol-c5e85dac to master node...
>>>>>>> Setting up the cluster...
>>>>>>> Mounting EBS volume vol-c5e85dac on /data...
>>>> ssh.py:66 - WARNING - specified key does not end in either rsa or dsa,
>>>> trying both
>>>>>>> Using private key /home/eads/deadskey.pem (rsa)
>>>> ERROR: An unexpected error occurred while tokenizing input
>>>> The following traceback may be corrupted or invalid
>>>> The error message is: ('EOF in multi-line statement', (405, 0))
>>>>
>>>> ---------------------------------------------------------------------------
>>>> TypeError                                 Traceback (most recent call last)
>>>>
>>>> /tmp/qqq/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster
>>>> in <module>()
>>>>       3 __requires__ = 'StarCluster==0.9999'
>>>>       4 import pkg_resources
>>>> ----> 5 pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
>>>>       6
>>>>       7
>>>>
>>>> /usr/lib/python2.6/dist-packages/pkg_resources.pyc in run_script(self,
>>>> requires, script_name)
>>>>     446         ns.clear()
>>>>     447         ns['__name__'] = name
>>>> --> 448         self.require(requires)[0].run_script(script_name, ns)
>>>>     449
>>>>     450
>>>>
>>>> /usr/lib/python2.6/dist-packages/pkg_resources.pyc in run_script(self,
>>>> script_name, namespace)
>>>>    1171             )
>>>>    1172             script_code = compile(script_text,script_filename,'exec')
>>>> -> 1173             exec script_code in namespace, namespace
>>>>    1174
>>>>    1175     def _has(self, path):
>>>>
>>>> /tmp/qqq/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster
>>>> in <module>()
>>>>       4
>>>>       5
>>>> ----> 6
>>>>       7
>>>>       8
>>>>
>>>> /tmp/qqq/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/cli.pyc
>>>> in main()
>>>>     850         sys.exit(0)
>>>>     851     try:
>>>> --> 852         sc.execute(args)
>>>>     853     except exception.BaseException,e:
>>>>     854         log.error(e.msg)
>>>>
>>>> /tmp/qqq/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/cli.pyc
>>>> in execute(self, args)
>>>>     169             log.info('Cluster settings are valid')
>>>>     170             if not self.opts.validate_only:
>>>> --> 171                 scluster.start(create=not self.opts.no_create)
>>>>     172                 if self.opts.login_master:
>>>>     173                     cluster.ssh_to_master(tag, self.cfg)
>>>>
>>>> /tmp/qqq/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/utils.pyc
>>>> in wrapper(*arg, **kargs)
>>>>      23         """Raw timing function """
>>>>      24         time1 = time.time()
>>>> ---> 25         res = func(*arg, **kargs)
>>>>      26         time2 = time.time()
>>>>      27         log.info('%s took %0.3f mins' % (func.func_name,
>>>> (time2-time1)/60.0))
>>>>
>>>> /tmp/qqq/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/cluster.pyc
>>>> in start(self, create)
>>>>     476             self.nodes, self.master_node,
>>>>     477             self.cluster_user, self.cluster_shell,
>>>> --> 478             self.volumes
>>>>     479         )
>>>>     480         self.create_receipt()
>>>>
>>>> /tmp/qqq/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/clustersetup.pyc
>>>> in run(self, nodes, master, user, user_shell, volumes)
>>>>     312         self._volumes = volumes
>>>>     313         self._setup_ebs_volume()
>>>> --> 314         self._setup_cluster_user()
>>>>     315         self._setup_scratch()
>>>>     316         self._setup_etc_hosts()
>>>>
>>>> /tmp/qqq/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/clustersetup.pyc
>>>> in _setup_cluster_user(self)
>>>>      67             max_uid = max(uid_db.keys())
>>>>      68             max_gid = uid_db[max_uid][1]
>>>> ---> 69             uid, gid = max_uid+1, max_gid+1
>>>>      70
>>>>      71         log.debug("Cluster user gid/uid: (%d, %d)" % (uid,gid))
>>>>
>>>> TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
>>>> eads at street:~/work/repo/StarCluster$
>>>> _______________________________________________
>>>> Starcluster mailing list
>>>> Starcluster at mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
> 
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvOJIcACgkQ4llAkMfDcrl83wCfbEIdajShtUKFTyAW9OoAJRQb
sS4An0/oEBho7uwoKG4C06xHym7AzmcN
=l/Eg
-----END PGP SIGNATURE-----



More information about the StarCluster mailing list