[StarCluster] Error creating/deleting security groups

Avner May avnermay at cs.columbia.edu
Tue Feb 24 18:03:07 EST 2015


Thanks so much Antonio.  It does seem like this is the issue (and I hope it
is).  I was very worried there was a larger issue, or that my AWS account
got into some error state that would be difficult to fix.

Hopefully AWS will resolve these issues and everything will start working
again.
Avner

On Tue, Feb 24, 2015 at 5:18 PM, Antonio Osorio <
antonio.osorio at schrodinger.com> wrote:

> Avner,
>
> AWS is having some API problems
>
> http://status.aws.amazon.com/
>
> Amazon Elastic Compute Cloud (N. Virginia) -> Increased API Error Rates:
>
> 12:22 PM PST We are investigating increased API error rates in the
> US-EAST-1 Region.
> 12:53 PM PST We can confirm increased API error rates for EC2 APIs in the
> US-EAST-1 Region. Running instances are not impacted.
> 1:58 PM PST We continue to work toward resolving increased API error
> rates for EC2 APIs in the US-EAST-1 Region. Running instances are not
> impacted.
>
> -Antonio
>
>
>
> On Feb 24, 2015, at 5:04 PM, Avner May <avnermay at cs.columbia.edu> wrote:
>
> And now this error message when calling "listclusters"...
>
> *C:\Windows\system32>starcluster listclusters*
> *StarCluster - (http://star.mit.edu/cluster <http://star.mit.edu/cluster>)
> (v. 0.95.6)*
> *Software Tools for Academics and Researchers (STAR)*
> *Please submit bug reports to starcluster at mit.edu <starcluster at mit.edu>*
>
> *!!! ERROR - *************************************************************
> *!!! ERROR - INCOMPATIBLE CLUSTER: babel*
> *!!! ERROR -*
> *!!! ERROR - The cluster 'babel' is not compatible with StarCluster*
> *!!! ERROR - 0.95.6. Possible reasons are:*
> *!!! ERROR -*
> *!!! ERROR - 1. The '@sc-babel' group was created using an incompatible*
> *!!! ERROR - version of StarCluster (stable or development).*
> *!!! ERROR -*
> *!!! ERROR - 2. The '@sc-babel' group was manually created outside of*
> *!!! ERROR - StarCluster.*
> *!!! ERROR -*
> *!!! ERROR - 3. One of the nodes belonging to '@sc-babel' was manually*
> *!!! ERROR - created outside of StarCluster.*
> *!!! ERROR -*
> *!!! ERROR - 4. StarCluster was interrupted very early on when first*
> *!!! ERROR - creating the cluster's security group.*
> *!!! ERROR -*
> *!!! ERROR - In any case 'babel' and its nodes cannot be used with this*
> *!!! ERROR - version of StarCluster (0.95.6).*
> *!!! ERROR -*
> *!!! ERROR - The cluster 'babel' currently has 0 active nodes.*
> *!!! ERROR -*
> *!!! ERROR - Please terminate the cluster using:*
> *!!! ERROR -*
> *!!! ERROR -     $ starcluster terminate --force babel*
> *!!! ERROR -*
> *!!! ERROR - *************************************************************
>
> *!!! ERROR - *************************************************************
> *!!! ERROR - INCOMPATIBLE CLUSTER: babel2*
> *!!! ERROR -*
> *!!! ERROR - The cluster 'babel2' is not compatible with StarCluster*
> *!!! ERROR - 0.95.6. Possible reasons are:*
> *!!! ERROR -*
> *!!! ERROR - 1. The '@sc-babel2' group was created using an incompatible*
> *!!! ERROR - version of StarCluster (stable or development).*
> *!!! ERROR -*
> *!!! ERROR - 2. The '@sc-babel2' group was manually created outside of*
> *!!! ERROR - StarCluster.*
> *!!! ERROR -*
> *!!! ERROR - 3. One of the nodes belonging to '@sc-babel2' was manually*
> *!!! ERROR - created outside of StarCluster.*
> *!!! ERROR -*
> *!!! ERROR - 4. StarCluster was interrupted very early on when first*
> *!!! ERROR - creating the cluster's security group.*
> *!!! ERROR -*
> *!!! ERROR - In any case 'babel2' and its nodes cannot be used with this*
> *!!! ERROR - version of StarCluster (0.95.6).*
> *!!! ERROR -*
> *!!! ERROR - The cluster 'babel2' currently has 0 active nodes.*
> *!!! ERROR -*
> *!!! ERROR - Please terminate the cluster using:*
> *!!! ERROR -*
> *!!! ERROR -     $ starcluster terminate --force babel2*
> *!!! ERROR -*
> *!!! ERROR - *************************************************************
>
> On Tue, Feb 24, 2015 at 4:58 PM, Avner May <avnermay at cs.columbia.edu>
> wrote:
>
>> Here's the latest error I received when trying to start a cluster.  What
>> does it mean that my default VPC is 'none'??
>>
>> *!!! ERROR - InvalidGroup.NotFound: The security group '@sc-babel2' does
>> not exist in default VPC 'none'*
>>
>> ======== FULL ERROR MESSAGE / CALL STACK ===========
>> C:\Windows\system32>starcluster start babel2
>> StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
>> Software Tools for Academics and Researchers (STAR)
>> Please submit bug reports to starcluster at mit.edu
>>
>> >>> Using default cluster template: main
>> >>> Validating cluster template settings...
>> >>> Cluster template settings are valid
>> >>> Starting cluster...
>> >>> Launching a 20-node cluster...
>> >>> Creating security group @sc-babel2...
>> >>> Waiting for security group @sc-babel2...
>> !!! ERROR - InvalidGroup.NotFound: The security group '@sc-babel2' does
>> not exist in default VPC 'none'
>> Traceback (most recent call last):
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cli.py",
>> line 274, in main
>>     sc.execute(args)
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\commands\start.py",
>> line 244, in execute
>>     validate_running=validate_running)
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>> line 1628, in start
>>     return self._start(create=create, create_only=create_only)
>>   File "<string>", line 2, in _start
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\utils.py",
>> line 112, in wrap_f
>>     res = func(*arg, **kargs)
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>> line 1643, in _start
>>     self.create_cluster()
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>> line 1163, in create_cluster
>>     self._create_flat_rate_cluster()
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>> line 1185, in _create_flat_rate_cluster
>>     force_flat=True)[0]
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>> line 926, in create_nodes
>>     cluster_sg = self.cluster_group.name
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>> line 655, in cluster_group
>>     vpc_id=vpc_id)
>>   File
>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\awsutils.py",
>> line 314, in create_group
>>     to_port=65535)
>>   File "C:\Python27\lib\site-packages\boto\ec2\securitygroup.py", line
>> 203, in authorize
>>     dry_run=dry_run)
>>   File "C:\Python27\lib\site-packages\boto\ec2\connection.py", line 3191,
>> in *authorize_security_group*
>>     params, verb='POST')
>>   File "C:\Python27\lib\site-packages\boto\connection.py", line 1226, in
>> get_status
>>     raise self.ResponseError(response.status, response.reason, body)
>> EC2ResponseError: EC2ResponseError: *400 Bad Request*
>> <?xml version="1.0" encoding="UTF-8"?>
>> <Response><Errors><Error><Code>InvalidGroup.NotFound</Code><Message>*The
>> security group '@sc-babel2' does not exist in default VPC 'none'*
>> </Message></Er
>>
>> ror></Errors><RequestID>94012bc0-dba2-4c0b-b9fe-00ac38a45ed0</RequestID></Response>
>>
>> On Tue, Feb 24, 2015 at 4:55 PM, Avner May <avnermay at cs.columbia.edu>
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm getting the errors below when I try to start a cluster, or
>>> listclusters.  This all started after I terminated a cluster.  I got an
>>> error during termination, so it told me to use the "-f" flag to force
>>> termination.  I did that, but it was taking a very long time to erase the
>>> security group.  So I interrupted the "terminate -f" command, and I've been
>>> having issues ever since.  Basically, if I try to start a cluster, it is
>>> taking forever in the step where it says "waiting for a security group
>>> @sc-cluster-name" (I've been waiting like 20+ minutes for a cluster to
>>> start...).  It then generally gives me some error like the one below.  It
>>> also fails in the "listclusters" command.  At the heart of this there seem
>>> to be issues in the "*get_all_security_groups*" and "
>>> *create_security_group*" functions in
>>> "C:\Python27\lib\site-packages\boto\ec2\connection.py".  Any idea what
>>> might be going on?  Help would be very appreciated, as this is totally
>>> blocking my progress on my work.
>>>
>>> Thanks a lot,
>>> Avner
>>>
>>> *C:\Windows\system32>starcluster start babel*
>>> *StarCluster - (http://star.mit.edu/cluster
>>> <http://star.mit.edu/cluster>) (v. 0.95.6)*
>>> *Software Tools for Academics and Researchers (STAR)*
>>> *Please submit bug reports to starcluster at mit.edu <starcluster at mit.edu>*
>>>
>>> *>>> Using default cluster template: main*
>>> *>>> Validating cluster template settings...*
>>> *>>> Cluster template settings are valid*
>>> *>>> Starting cluster...*
>>> *>>> Launching a 20-node cluster...*
>>> *>>> Creating security group @sc-babel...*
>>> *>>> Waiting for security group @sc-babel...*
>>> *!!! ERROR - InternalError: An internal error has occurred*
>>> *Traceback (most recent call last):*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cli.py",
>>> line 274, in main*
>>> *    sc.execute(args)*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\commands\start.py",
>>> line 244, in execute*
>>> *    validate_running=validate_running)*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>>> line 1628, in start*
>>> *    return self._start(create=create, create_only=create_only)*
>>> *  File "<string>", line 2, in _start*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\utils.py",
>>> line 112, in wrap_f*
>>> *    res = func(*arg, **kargs)*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>>> line 1643, in _start*
>>> *    self.create_cluster()*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>>> line 1163, in create_cluster*
>>> *    self._create_flat_rate_cluster()*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>>> line 1185, in _create_flat_rate_cluster*
>>> *    force_flat=True)[0]*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>>> line 926, in create_nodes*
>>> *    cluster_sg = self.cluster_group.name
>>> <http://self.cluster_group.name>*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>>> line 655, in cluster_group*
>>> *    vpc_id=vpc_id)*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\awsutils.py",
>>> line 300, in create_group*
>>> *    while not self.get_group_or_none(name):*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\awsutils.py",
>>> line 333, in get_group_or_none*
>>> *    return self.get_security_group(name)*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\awsutils.py",
>>> line 357, in get_security_group*
>>> *    filters={'group-name': groupname})[0]*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\awsutils.py",
>>> line 369, in get_security_groups*
>>> *    return self.conn.get_all_security_groups(filters=filters)*
>>> *  File "C:\Python27\lib\site-packages\boto\ec2\connection.py", line
>>> 2968, in get_all_security_groups*
>>> *    [('item', SecurityGroup)], verb='POST')*
>>> *  File "C:\Python27\lib\site-packages\boto\connection.py", line 1169,
>>> in get_list*
>>> *    response = self.make_request(action, params, path, verb)*
>>> *  File "C:\Python27\lib\site-packages\boto\connection.py", line 1115,
>>> in make_request*
>>> *    return self._mexe(http_request)*
>>> *  File "C:\Python27\lib\site-packages\boto\connection.py", line 1027,
>>> in _mexe*
>>> *    raise BotoServerError(response.status, response.reason, body)*
>>> *BotoServerError: BotoServerError: 500 Internal Server Error*
>>> *<?xml version="1.0" encoding="UTF-8"?>*
>>> *<Response><Errors><Error><Code>InternalError</Code><Message>An internal
>>> error has
>>> occurred</Message></Error></Errors><RequestID>808ce646-9203-412f-8fa*
>>> *9-0d994e74e418</RequestID></Response>*
>>>
>>> I am also seeing the following error
>>> *C:\Windows\system32>starcluster listclusters*
>>> *StarCluster - (http://star.mit.edu/cluster
>>> <http://star.mit.edu/cluster>) (v. 0.95.6)*
>>> *Software Tools for Academics and Researchers (STAR)*
>>> *Please submit bug reports to starcluster at mit.edu <starcluster at mit.edu>*
>>>
>>> *!!! ERROR - InternalError: An internal error has occurred*
>>> *Traceback (most recent call last):*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cli.py",
>>> line 274, in main*
>>> *    sc.execute(args)*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\commands\listclusters.py",
>>> line 36, in execute*
>>> *    show_ssh_status=self.opts.show_ssh_status)*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>>> line 280, in list_clusters*
>>> *    cluster_groups = self.get_cluster_security_groups()*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\cluster.py",
>>> line 253, in get_cluster_security_groups*
>>> *    sgs = self.ec2.get_security_groups(filters={'group-name': glob})*
>>> *  File
>>> "C:\Python27\lib\site-packages\starcluster-0.95.6-py2.7.egg\starcluster\awsutils.py",
>>> line 369, in get_security_groups*
>>> *    return self.conn.get_all_security_groups(filters=filters)*
>>> *  File "C:\Python27\lib\site-packages\boto\ec2\connection.py", line
>>> 2968, in get_all_security_groups*
>>> *    [('item', SecurityGroup)], verb='POST')*
>>> *  File "C:\Python27\lib\site-packages\boto\connection.py", line 1169,
>>> in get_list*
>>> *    response = self.make_request(action, params, path, verb)*
>>> *  File "C:\Python27\lib\site-packages\boto\connection.py", line 1115,
>>> in make_request*
>>> *    return self._mexe(http_request)*
>>> *  File "C:\Python27\lib\site-packages\boto\connection.py", line 1027,
>>> in _mexe*
>>> *    raise BotoServerError(response.status, response.reason, body)*
>>> *BotoServerError: BotoServerError: 500 Internal Server Error*
>>> *<?xml version="1.0" encoding="UTF-8"?>*
>>> *<Response><Errors><Error><Code>InternalError</Code><Message>An internal
>>> error has
>>> occurred</Message></Error></Errors><RequestID>c18d1a11-a6a6-4a8d-a74*
>>> *2-f3f69b593189</RequestID></Response>*
>>>
>>> I also got this error recently:
>>> *C:\Windows\system32>starcluster start babel2*
>>> *StarCluster - (http://star.mit.edu/cluster
>>> <http://star.mit.edu/cluster>) (v. 0.95.6)*
>>> *Software Tools for Academics and Researchers (STAR)*
>>> *Please submit bug reports to starcluster at mit.edu <starcluster at mit.edu>*
>>>
>>>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20150224/f88398be/attachment-0001.htm


More information about the StarCluster mailing list