I've added an extension to the starcluster config syntax in which NODE_INSTANCE_TYPE can now also be a list, allowing the user the specify a more structured cluster with multiple difference instance types with various instance counts<br>
<br>The syntax I currently use in my extension is to overload NODE_INSTANCE_TYPE so that it can represent a "compound type" as a comma-separated list of types and the numbers of instances of each type that is desired. E.g.:<br>
<br> NODE_INSTANCE_TYPE = m1.xlarge 3, m1.large 2<br><br>means that 3 m1.xlarge and 2 m1.large node instances -- in addition to the master -- are desired. <br><br>I'd like to poll the following questions:<br><br>1) Is the basic syntax reasonable?<br>
<br>2) I allow a simplification, namely that you don't have to specify the last cluster size, e.g. <br><br> NODE_INSTANCE_TYPE = m1.xlarge 3, m1.large<br><br>is a valid config. If left unspecified, the instance count of the last type is determined during config.load() simply by setting<br>
<br> count of last instance type = CLUSTER_SIZE - [total number of specified instance types in the non-last instance types listed] - 1<br><br>where the -1 is done to deal with the fact that there's a master. <br><br>
My questions are:<br> a) Is this simplification desirable? One point in its favor is that it is completely compatible with current configuration syntax -- the special case when only one node_instance_type is desired remains unchanged, and easy. <br>
<br> b) Should I add a further simplification, in which _other_ (that is, non-last) instance types can have non-specified numbers of instances, defaulting to 1? E.g. like having the following be valid:<br><br> NODE_INSTANCE_TYPE = m1.xlarge, c1.large, m1.xlarge<br>
<br>which would mean that 1 m1.xlarge, 1 c1.large and CLUSTER_SIZE - 3 m1.large instances were desired. Though it might make things easier, I'm uneasy about this syntax because leaving the last instance type count unspecified would have a different meaning than leaving the other instance type counts unspecified. [Unless <i>all</i> unspecified instance counts defaulted to 1, which is even worse since it is not compatible with current syntax and makes what should be the easier case more complicated.] <br>
<br>3) If you don't specify master_instance_type, starcluster currently defaults this to the node_instance_type. Now that node_instance_type is compound, I've set it to default to the <i>first</i> listed instance type, e.g. m1.xlarge in the above example. Is this the correct behavior? Should it instead default to the "smallest" (e.g. cheapest) instance type listed, e.g. 'm1.large' in the above example? Or to the <i>last</i> listed type? <br>
<br>Even when there are multiple instance types, I'm still having all those types use the single specified NODE_IMAGE_ID ami. In the future it might also be possible to allow listing multiple amis along with the instance types. <br>
<br>Any thoughts on these issues before I submit a pull request would be great. And sorry for the long email,<br>Dan<br><br><br>