<div dir="ltr">Hi All,<div><br></div><div style>Okay, I'm in the Twilight Zone now. After starting a small cluster on the 23rd, and doing minimal reconfig (qmod -d) to disable the sge_execd on the master and qconf -mq all.q to change some slot counts -- all of which worked fine -- I come back these days later to find an unusable SGE config:</div>
<div style><br></div><div style><div>root@AWS-VTMXmaster-w2b ~</div><div># qstat -f</div><div>error: sge_gethostbyname failed</div><div><br></div><div style>/etc/hosts is correct for all its (internal) host addrs:</div><div style>
<br></div><div style><div>root@AWS-VTMXmaster-w2b ~</div><div># cat /etc/hosts</div><div>127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4</div><div>::1 localhost localhost.localdomain localhost6 localhost6.localdomain6</div>
<div>10.250.65.204 master<br></div><div>10.251.30.12 node001</div><div><br></div><div><br></div><div style>The gethostbyname utility works correctly (so does gethostbyaddr):</div><div style><br></div><div><div>root@AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs</div>
<div># /opt/sge6/utilbin/linux-x64/gethostbyname master</div><div>Hostname: master</div><div>Aliases: </div><div>Host Address(es): 10.250.65.204 </div><div><br></div><div>root@AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs</div>
<div># /opt/sge6/utilbin/linux-x64/gethostbyname node001</div><div>Hostname: node001</div><div>Aliases: </div><div>Host Address(es): 10.251.30.12 </div><div><br></div><div>root@AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs</div>
<div># qstat -f</div><div>error: sge_gethostbyname failed</div></div><div><br></div><div><br></div><div style>I went so far as to edit the hostname in /etc/sysconfig/network to contain "master" and "node001" on the two nodes. Same error. </div>
<div style><br></div><div style>I have been all over the 'net looking for solutions, but have found nothing with a clear resolution. <a href="http://gridengine.sunsource.net">gridengine.sunsource.net</a> is gone. The follow-on at <a href="http://gridengine.org/pipermail/users/">http://gridengine.org/pipermail/users/</a> doesn't seem to be searchable, except on an onerous, month-by-month click-thru basis (which hasn't yielded anything useful as I slog thru it).</div>
<div style><br></div><div style>Short of starcluster restart'ing, I'll appreciate anyone's inputs on what to try next.</div><div style><br></div><div style>Thanks much,</div><div style>Lyn</div></div><div><br>
</div></div></div>