<div dir="ltr">Hi Daniel and StarClusterers,<div><br></div><div>Qubole uses a fork of Starcluster and our service has been widely affected by this problem.</div><div><br></div><div>What happened is that AWS broke their API that returns the user data file for nodes launched by starcluster. I believe we can now only get the user data file for the first instance in the launch list. As a result - calls to get alias (which read the user data file of the node) broke.</div>
<div><br></div><div>We have filed a case with AWS - but they have been unusually sloppy in fixing it. They haven't even acknowledged the problem in their status page. Meanwhile - we have been busy coding up workaround hacks.</div>
<div><br></div><div>If the StarCluster community can independently complain to AWS - that might help (perhaps). The workarounds aren't pleasent.</div><div><br></div><div>- Joydeep</div><div><br></div><div><br></div><div>
<br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Nov 26, 2013 at 12:44 AM, Daniel Polhamus <span dir="ltr"><<a href="mailto:danp@metrumrg.com" target="_blank">danp@metrumrg.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi all,<div>We're seeing issues with using clusters consisting of multiple nodes today. Launch of clusters with >=3 nodes fails, with report of being unable to assign aliases to nodes other than "master". The same problem is seen with addnode. Adding one node is fine, but adding more than one gives the alias problem again. Terminating these clusters fails due to the missing alias as well, you have to use the ec2toolkit to shut down the offending nodes that were not named.</div>
<div><br></div><div>I'm on the latest developmental version, and I've noticed that there's a lot of gibberish in the node user data (as viewed through the web console) as of today.</div><div><br></div><div>Debug at the end, and thanks for the help.</div>
<div><br></div><div>Dan</div><div><br></div><div><br></div><div>> starcluster -d start -c testing brokenCluster -s 3</div><div><br></div><div>... </div><div>...</div><div><br></div><div><div>>>> Waiting for all nodes to be in a 'running' state...</div>
<div>2013-11-25 14:13:06,323 cluster.py:734 - DEBUG - existing nodes: {}</div><div>2013-11-25 14:13:06,323 cluster.py:742 - DEBUG - adding node i-323f504f to self._nodes list</div><div>2013-11-25 14:13:06,839 cluster.py:742 - DEBUG - adding node i-2c3f5051 to self._nodes list</div>
<div>2013-11-25 14:13:07,001 node.py:147 - DEBUG - invalid aliases file in user_data:</div><div><br></div><div>3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%</div><div>!!! ERROR - instance i-2c3f5051 has no alias</div>
<div>2013-11-25 14:13:07,003 cli.py:301 - DEBUG - instance i-2c3f5051 has no alias</div><div>Traceback (most recent call last):</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cli.py", line 274, in main</div>
<div> sc.execute(args)</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/commands/start.py", line 220, in execute</div><div> validate_running=validate_running)</div>
<div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 1534, in start</div><div> return self._start(create=create, create_only=create_only)</div><div> File "<string>", line 2, in _start</div>
<div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/utils.py", line 111, in wrap_f</div><div> res = func(*arg, **kargs)</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 1557, in _start</div>
<div> self.setup_cluster()</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 1565, in setup_cluster</div><div> self.wait_for_cluster()</div><div>
File "<string>", line 2, in wait_for_cluster</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/utils.py", line 111, in wrap_f</div><div> res = func(*arg, **kargs)</div>
<div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 1350, in wait_for_cluster</div><div> self.wait_for_running_instances()</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 1305, in wait_for_running_instances</div>
<div> nodes = nodes or self.get_nodes_or_raise()</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 754, in get_nodes_or_raise</div><div> nodes = self.nodes</div>
<div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 744, in nodes</div><div> if n.is_master():</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/node.py", line 898, in is_master</div>
<div> return self.alias == 'master' or self.alias.endswith("-master")</div><div> File "/Library/Python/2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/node.py", line 150, in alias</div>
<div> "instance %s has no alias" % <a href="http://self.id" target="_blank">self.id</a>)</div><div>BaseException: instance i-2c3f5051 has no alias</div></div><span class="HOEnZb"><font color="#888888"><div><div>
<br></div>-- <br>Daniel G Polhamus, PhD<div>Metrum Research Group, LLC</div>
<div><br></div>
</div></font></span></div>
<br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div><br></div>