Sorry for the spam, but here's another follow-up. <div><br></div><div>I found that this only happens when I use a non HVM-EBS AMI for the master, but an HWM-EBS for the master.</div><div><br></div><div>This is probably because StarCluster copies the sge install from the master to the nodes, and this doesn't play nice when the nodes are CentOS based but the master is Ubuntu based.</div>
<div><br></div><div>Any ideas for a work-around?<br><div><br><div class="gmail_quote">On Mon, Aug 27, 2012 at 2:07 PM, Jesse Lu <span dir="ltr"><<a href="mailto:jesselu@stanford.edu" target="_blank">jesselu@stanford.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Follow-up,<div><br></div><div>Here are the contents of the installation log file (for grid engine)</div><div><br></div>
<div><div>cat /opt/sge6/default/common/install_logs/execd_install_node001_2012-08-27_14:04:29.log</div>
<div><br></div><div><br></div><div>Your $SGE_ROOT directory: /opt/sge6</div><div><br></div><div><br></div><div>Using cell: >default<</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div>
Using local execd spool directory [/opt/sge6/default/spool/exec_spool_local]</div><div><br></div><div>Creating local configuration for host >node001<</div><div>sgeadmin@node001 modified "node001" in configuration list</div>
<div>Local configuration for host >node001< created.</div><div><br></div><div>Host >master< already in submit host list!</div><div>Host >node001< already in submit host list!</div><div><br></div><div><br>
</div><div> starting sge_execd</div><div><br></div><div><br></div><div>No modification because "node001" already exists in "hostlist" of "hostgroup"</div><div>root@node001 modified "@allhosts" in host group list</div>
<div>root@node001 modified "all.q" in cluster queue list</div><div><br></div><div>got select error: Connection refused</div><div>got select error: closing "node001/execd/1"</div><div>Execd on host node001 is not started!</div>
<div><div class="h5">
<div><br></div><br><div class="gmail_quote">On Mon, Aug 27, 2012 at 1:37 PM, Jesse Lu <span dir="ltr"><<a href="mailto:jesselu@stanford.edu" target="_blank">jesselu@stanford.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
ami-12b6477b produces the folowing error on cluster startup<br><br>!!! ERROR - command 'cd /opt/sge6 && TERM=rxvt ./inst_sge -x -noremote -auto ./ec2_sge.conf' failed with status 1
<div><br></div><div>I'm guessing the sge6 installation is faulty? Can anyone help? Thanks!</div><span><font color="#888888"><div><br></div><div>Jesse</div>
</font></span></blockquote></div><br></div></div></div>
</blockquote></div><br></div></div>