[StarCluster] StarCluster Development VPC-Starclusters - possible bug relating to "Tag Value exceeds 255 characters"....
Jennifer Staab
jstaab at cs.unc.edu
Thu Dec 12 12:51:34 EST 2013
Thanks for getting back to me. I added the email below as a comment on the
github track: https://github.com/jtriley/StarCluster/issues/348
----------------------------------
I think I figured out the problem and I have a solution that works for
creating a starcluster under the VPC BUT you can't change its attributes
after creation other than doing a force termination.
Starcluster works in EC2-Classic because they don't limit the Tag Values to
a certain size, but with EC2-VPC Tag Values are limited to size 255
characters. I think this was discussed before if you use the following link
https://github.com/jtriley/StarCluster/issues/21 and search for words “3)
StarCluster + VPC limitation:" you can read more about this issue.
Basically the @sc-core Tag's value can be greater than 255 characters in
length, if this is the case then Starcluster software will fail to create a
cluster under the VPC. This is why for my one AMI (had @sc-core Tag value
less than 255 characters) I could successfully create a starcluster under
the VPC, but for all other AMI's I wasn't able to.
Temporary Solution:
1 ) Locate the .../starcluster/cluster.py file and open it up in your
favorite editor
2) Go to line 681 this should be the first line of the method
"_add_tags_to_sg" and it should have following text "def
_add_tags_to_sg(self, sg):"
3) The value for 'static.CORE_TAG' is the Tag Name '@sc-core' and it's
value is 'core_settings'. The issue occurs in adding this Tag with line
698 'sg.add_tag(static.CORE_TAG, core_settings)'. If the value of
'core_settings' is greater than 255 then this is where the program fails.
Add the following 3 lines of code (indicated by (+)) between lines 697 'if
not static.CORE_TAG in sg.tags:' and 698 'sg.add_tag(static.CORE_TAG,
core_settings)'
-------------------------------- Code Change
-------------------------------------------------------------------------------
697 if not static.CORE_TAG in sg.tags:
698(+) if(len(core_settings) > 255):
699(+) print "\nWarning: For ", static.CORE_TAG, " truncating
core_settings from ", len(core_settings), " to length 255."
700(+) core_settings=core_settings[:255]
701 sg.add_tag(static.CORE_TAG, core_settings)
-------------------------------- Code Change
-------------------------------------------------------------------------------
WARNING - there are implications of making this code change. In doing so
you will not be able to change the cluster other than a forced termination
IF the core_settings value was truncated (warning appears during creation).
Seems like the @sc-core Tag holds cluster settings for a cluster that have
been serialized and then compressed and the tag value is that compressed
value. Truncating this value allows you to create a cluster BUT you can't
change attributes once it is created because the software needs a
non-truncated value to link back to the cluster (I think?).
Once up and running the cluster runs fine, but ultimately for Starcluster
to work consistently with EC2-VPC the code would need to be changed as to
limit the @sc-core value to only 255 characters or fewer. Maybe
eliminating some of the options in the core tag, better compression
algorithm, or ?? might be an easy fix to keep from having to do a big
re-write of the code. My temporary solution works for a 'static' cluster
under the VPC.
Thanks,
-Jennifer
On Thu, Dec 12, 2013 at 12:09 PM, Justin Riley <jtriley at mit.edu> wrote:
> Hi Jennifer,
>
> Sorry you're having issues and thanks for reporting. I've created an
> issue on github to track this:
>
> https://github.com/jtriley/StarCluster/issues/348
>
> Would you mind commenting on that issue with a copy of your config so
> that I can take a look? Please remove all sensitive parts of your config
> first.
>
> Thanks!!
>
> ~Justin
>
> On Tue, Dec 10, 2013 at 11:23:19AM -0500, Jennifer Staab wrote:
> > I have had limited success getting Starcluster to successfully launch
> a
> > cluster with EC2-VPC nodes under the development version (0.9999).
> Using a
> > certain AMI I can easily launch a Starcluster cluster with EC2-VPC
> nodes,
> > but using a different AMI it fails to launch. I do set the config
> > variables "VPC_ID" and "SUBNET_ID" and the only difference between
> the two
> > cluster templates is the AMI that is used.
> > Both AMIs used successfully launch a Starcluster cluster with
> EC2-classic
> > nodes. The only noted difference between the AMIs is that the one
> that
> > successfully launches a Starcluster cluster with VPC-EC2 nodes is a
> > private AMI that is "shared" with the account that I am running my VPC
> > within. The AMI that doesn't work with Starcluster-VPC is one that is
> > private AMI "owned" by the account I am running my VPC within.
> > I believe the error I am getting has something to do with the Tags,
> > specifically the "@sc-core" tag's value being beyond 255 characters,
> but I
> > could be wrong. Below I have included an example of the successful
> > launch, the failed launch (including error message), and the listed
> > clusters after both commands.
> > Any suggestions on how to address this issue would be greatly
> appreciated.
> > Thanks in advance for the help,
> > -Jennifer
> >
> -------------------------------------------------------------------------------------------------
> > ------ Below is what it looks like when I have a successful launch ---
> >
> -------------------------------------------------------------------------------------------------
> > (starcluster)root at xxxxxxxxxxx:~# starcluster start -c testvpcA vpcA
> > StarCluster - ([1]http://star.mit.edu/cluster) (v. 0.9999)
> > Software Tools for Academics and Researchers (STAR)
> > Please submit bug reports to [2]starcluster at mit.edu
> > >>> Validating cluster template settings...
> > >>> Cluster template settings are valid
> > >>> Starting cluster...
> > >>> Launching a 1-node cluster...
> > >>> Creating security group @sc-vpcA...
> > Reservation:r-2843fa4e
> > >>> Waiting for cluster to come up... (updating every 30s)
> > >>> Waiting for all nodes to be in a 'running' state...
> > 1/1
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > 100%
> > >>> Waiting for SSH to come up on all nodes...
> > 1/1
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > 100%
> > >>> Waiting for cluster to come up took 1.574 mins
> > >>> The master node is
> > >>> Configuring cluster...
> > >>> Running plugin starcluster.clustersetup.DefaultClusterSetup
> > >>> Configuring hostnames...
> > 1/1
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > 100%
> > >>> Creating cluster user: sgeadmin (uid: 1007, gid: 1000)
> > 1/1
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > 100%
> > >>> Configuring scratch space for user(s): sgeadmin
> > 1/1
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > 100%
> > >>> Configuring /etc/hosts on each node
> > 1/1
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > 100%
> > >>> Starting NFS server on master
> > >>> Setting up NFS took 0.113 mins
> > >>> Configuring passwordless ssh for root
> > >>> Configuring passwordless ssh for sgeadmin
> > >>> Running plugin starcluster.plugins.sge.SGEPlugin
> > >>> Configuring SGE...
> > >>> Setting up NFS took 0.000 mins
> > >>> Removing previous SGE installation...
> > >>> Installing Sun Grid Engine...
> > >>> Creating SGE parallel environment 'orte'
> > 1/1
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > 100%
> > >>> Adding parallel environment 'orte' to queue 'all.q'
> > >>> Configuring cluster took 0.679 mins
> > >>> Starting cluster took 2.307 mins
> > The cluster is now ready to use. To login to the master node
> > as root, run:
> > $ starcluster sshmaster vpcA
> > If you're having issues with the cluster you can reboot the
> > instances and completely reconfigure the cluster from
> > scratch using:
> > $ starcluster restart vpcA
> > When you're finished using the cluster and wish to terminate
> > it and stop paying for service:
> > $ starcluster terminate vpcA
> > Alternatively, if the cluster uses EBS instances, you can
> > use the 'stop' command to shutdown all nodes and put them
> > into a 'stopped' state preserving the EBS volumes backing
> > the nodes:
> > $ starcluster stop vpcA
> > WARNING: Any data stored in ephemeral storage (usually /mnt)
> > will be lost!
> > You can activate a 'stopped' cluster by passing the -x
> > option to the 'start' command:
> > $ starcluster start -x vpcA
> > This will start all 'stopped' nodes and reconfigure the
> > cluster.
> >
> -------------------------------------------------------------------------------------------------
> > ------ Below is what it looks like when I have a FAILED launch ---
> >
> -------------------------------------------------------------------------------------------------
> > (starcluster)root at xxxxxxxxxxx:~# starcluster start -c testvpcB vpcB
> > StarCluster - ([3]http://star.mit.edu/cluster) (v. 0.9999)
> > Software Tools for Academics and Researchers (STAR)
> > Please submit bug reports to [4]starcluster at mit.edu
> > >>> Validating cluster template settings...
> > >>> Cluster template settings are valid
> > >>> Starting cluster...
> > >>> Launching a 1-node cluster...
> > >>> Creating security group @sc-vpcB...
> > !!! ERROR - InvalidParameterValue: Tag value exceeds the maximum
> length of
> > 255 characters
> > Traceback (most recent call last):
> > File
> "/root/.virtualenvs/starcluster/starcluster/starcluster/cli.py",
> > line 274, in main
> > sc.execute(args)
> > File
> >
> "/root/.virtualenvs/starcluster/starcluster/starcluster/commands/start.py",
> > line 220, in execute
> > validate_running=validate_running)
> > File
> > "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
> line
> > 1537, in start
> > return self._start(create=create, create_only=create_only)
> > File "<string>", line 2, in _start
> > File
> "/root/.virtualenvs/starcluster/starcluster/starcluster/utils.py",
> > line 111, in wrap_f
> > res = func(*arg, **kargs)
> > File
> > "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
> line
> > 1552, in _start
> > self.create_cluster()
> > File
> > "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
> line
> > 1066, in create_cluster
> > self._create_flat_rate_cluster()
> > File
> > "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
> line
> > 1091, in _create_flat_rate_cluster
> > force_flat=True)[0]
> > File
> > "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
> line
> > 859, in create_nodes
> > cluster_sg = [5]self.cluster_group.name
> > File
> > "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
> line
> > 657, in cluster_group
> > self._add_tags_to_sg(sg)
> > File
> > "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
> line
> > 698, in _add_tags_to_sg
> > sg.add_tag(static.CORE_TAG, core_settings)
> > File
> >
> "/root/.virtualenvs/starcluster/local/lib/python2.7/site-packages/boto-2.19.0-py2.7.egg/boto/ec2/ec2object.py",
> > line 82, in add_tag
> > dry_run=dry_run
> > File
> >
> "/root/.virtualenvs/starcluster/local/lib/python2.7/site-packages/boto-2.19.0-py2.7.egg/boto/ec2/connection.py",
> > line 4026, in create_tags
> > return self.get_status('CreateTags', params, verb='POST')
> > File
> >
> "/root/.virtualenvs/starcluster/local/lib/python2.7/site-packages/boto-2.19.0-py2.7.egg/boto/connection.py",
> > line 1158, in get_status
> > raise self.ResponseError(response.status, response.reason, body)
> > EC2ResponseError: EC2ResponseError: 400 Bad Request
> > <?xml version="1.0" encoding="UTF-8"?>
> >
> <Response><Errors><Error><Code>InvalidParameterValue</Code><Message>Tag
> > value exceeds the maximum length of 255
> >
> characters</Message></Error></Errors><RequestID>1f589605-8f30-472d-8989-22ea120aea14</RequestID></Response>
> >
> -----------------------------------------------------------------------------------------------------------------
> > ------ When if FAILS it creates only a security group see
> "listclusters"
> > below ---
> >
> -----------------------------------------------------------------------------------------------------------------
> > (starcluster)root at xxxxxxxxxxx:~# starcluster listclusters
> > StarCluster - ([6]http://star.mit.edu/cluster) (v. 0.9999)
> > Software Tools for Academics and Researchers (STAR)
> > Please submit bug reports to [7]starcluster at mit.edu
> > -------------------------------
> > vpcB (security group: @sc-vpcB)
> > -------------------------------
> > Launch time: N/A
> > Uptime: N/A
> > Zone: N/A
> > Keypair: N/A
> > EBS volumes: N/A
> > Cluster nodes: N/A
> > -------------------------------
> > vpcA (security group: @sc-vpcA)
> > -------------------------------
> > Launch time: 2013-12-10 14:39:36
> > Uptime: 0 days, 00:04:23
> > Zone: us-east-1b
> > Keypair: Starcluster_VPC
> > EBS volumes: N/A
> > Cluster nodes:
> > master running i-1d745b65 10.0.0.138
> > Total nodes: 1
> > (starcluster)root at xxxxxxxxxxx:~#
> >
> > References
> >
> > Visible links
> > 1. http://star.mit.edu/cluster
> > 2. mailto:starcluster at mit.edu
> > 3. http://star.mit.edu/cluster
> > 4. mailto:starcluster at mit.edu
> > 5. http://self.cluster_group.name/
> > 6. http://star.mit.edu/cluster
> > 7. mailto:starcluster at mit.edu
>
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20131212/e57dc826/attachment-0001.htm
More information about the StarCluster
mailing list