[StarCluster] Amazon EC2 bug?

François-Michel L'Heureux fmlheureux at datacratic.com
Mon Dec 15 17:08:37 EST 2014


I want to share this with you.

Looking at a running StarCluster install, I saw a weird error fly by my
console.

ERROR - InvalidInstanceID.NotFound: The instance ID 'i-fc67821c' does not
exist.

I digged the instance ID and it was a valid instance, but from another
StarCluster installation. (We run more than one cluster.)

I then ran some grep calls over logs of the last 5 days and noticed a few
of those errors. It's always the same thing, a cluster adds some nodes and
for a short time window other clusters get those nodes via the "def nodes"
property of cluster.py
<https://github.com/jtriley/StarCluster/blob/50894f517837eb6b9a68f3e45ac7649e9c78c467/starcluster/cluster.py#L751>
.

My guess is that the security group filter is not always honoured for some
unknown reason. Amazon EC2 issue? Boto version issue? The former is more
likely since the error is transient.

Anyone having multiple cluster got that error?

In the meantime, I'll see if I can develop a patch to filter the results
and remove the nodes that shouldn't have been returned.

Cheers
Mich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20141215/ca73a3f4/attachment.htm


More information about the StarCluster mailing list