<div dir="ltr">Yep, it works again with those changes.<div><br></div><div style>So, how should I stop the regression in a non-kludgy way?</div><div style><br></div><div style>Thanks again,</div><div style>Lyn</div></div><div class="gmail_extra">
<br><br><div class="gmail_quote">On Fri, Dec 27, 2013 at 2:43 PM, Rayson Ho <span dir="ltr"><<a href="mailto:raysonlogin@gmail.com" target="_blank">raysonlogin@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
/etc/sysconfig/network is read during reboot, and may be after DHCP...<br>
<br>
To see if it is the issue, set HOSTNAME back to master, and also run<br>
"hostname master" as root.<br>
<div class="im HOEnZb"><br>
Rayson<br>
<br>
==================================================<br>
Open Grid Scheduler - The Official Open Source Grid Engine<br>
<a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
<a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br>
<br>
<br>
</div><div class="HOEnZb"><div class="h5">On Fri, Dec 27, 2013 at 7:40 PM, Lyn Gerner <<a href="mailto:schedulerqueen@gmail.com">schedulerqueen@gmail.com</a>> wrote:<br>
> Thanks for digging, Rayson.<br>
><br>
> So, /etc/sysconfig/network had HOSTNAME=centos-ami when the problem first<br>
> occurred. I tried resetting it to "master" and then retried the SGE<br>
> commands (qstat, qsub, etc.). They still failed with the same error at that<br>
> point, so I switched them back, not knowing for sure if they'd been set to<br>
> master and node001 to begin with.<br>
><br>
> Thanks,<br>
> Lyn<br>
><br>
><br>
> On Fri, Dec 27, 2013 at 2:35 PM, Rayson Ho <<a href="mailto:raysonlogin@gmail.com">raysonlogin@gmail.com</a>> wrote:<br>
>><br>
>> (Updating the list...)<br>
>><br>
>> The hostname on the master gets reset to centos-ami, which is not<br>
>> resolvable. Thus Grid Engine complains about the hostname issue.<br>
>><br>
>> Lyn: what is the value of the HOSTNAME key in "/etc/sysconfig/network"<br>
>> on your master instance??<br>
>><br>
>> Justin & other devs: set_hostname() in node.py works on Ubuntu because<br>
>> Ubuntu uses /etc/hostname, but RHEL (and RHEL-based distros like<br>
>> CentOS, Oracle Linux, Scientific Linux) uses /etc/sysconfig/network,<br>
>> and yet SuSE uses /etc/HOSTNAME!<br>
>><br>
>> Rayson<br>
>><br>
>> ==================================================<br>
>> Open Grid Scheduler - The Official Open Source Grid Engine<br>
>> <a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
>> <a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br>
>><br>
>><br>
>> On Fri, Dec 27, 2013 at 6:39 PM, Lyn Gerner <<a href="mailto:schedulerqueen@gmail.com">schedulerqueen@gmail.com</a>><br>
>> wrote:<br>
>> > I used the Scientific Linux AMI (been a long time, but I found it from<br>
>> > the<br>
>> > SC site), and 0.94.3 is my SC version.<br>
>> ><br>
>> ><br>
>> > On Fri, Dec 27, 2013 at 1:36 PM, Rayson Ho <<a href="mailto:raysonlogin@gmail.com">raysonlogin@gmail.com</a>><br>
>> > wrote:<br>
>> >><br>
>> >> Hmm, which AMI did you use, and what's the version of SC?<br>
>> >><br>
>> >> Rayson<br>
>> >><br>
>> >> ==================================================<br>
>> >> Open Grid Scheduler - The Official Open Source Grid Engine<br>
>> >> <a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
>> >> <a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br>
>> >><br>
>> >><br>
>> >> On Fri, Dec 27, 2013 at 6:33 PM, Lyn Gerner <<a href="mailto:schedulerqueen@gmail.com">schedulerqueen@gmail.com</a>><br>
>> >> wrote:<br>
>> >> > root@AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs<br>
>> >> > # /opt/sge6/utilbin/linux-x64/gethostname -name<br>
>> >> > error resolving local host: can't resolve host name (h_errno =<br>
>> >> > HOST_NOT_FOUND)<br>
>> >> ><br>
>> >> > root@AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs<br>
>> >> > # hostname<br>
>> >> > centos-ami<br>
>> >> ><br>
>> >> > root@AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs<br>
>> >> > # hostname -f<br>
>> >> > hostname: Unknown host<br>
>> >> ><br>
>> >> > What's weird is that I have never mucked with any of this under<br>
>> >> > StarCluster,<br>
>> >> > and have only recently started having problems. Can't pinpoint any<br>
>> >> > specific<br>
>> >> > event or thing that changed--except that I started leaving the config<br>
>> >> > up<br>
>> >> > for<br>
>> >> > days instead of hours at a stretch.<br>
>> >> ><br>
>> >> > Thanks,<br>
>> >> > Lyn<br>
>> >> ><br>
>> >> ><br>
>> >> > On Fri, Dec 27, 2013 at 1:30 PM, Rayson Ho <<a href="mailto:raysonlogin@gmail.com">raysonlogin@gmail.com</a>><br>
>> >> > wrote:<br>
>> >> >><br>
>> >> >> No problem, and I think that's why it is failing. Can you also send<br>
>> >> >> me<br>
>> >> >> the output of:<br>
>> >> >><br>
>> >> >> 1) gethostname -name<br>
>> >> >><br>
>> >> >> 2) hostname<br>
>> >> >><br>
>> >> >> 3) hostname -f<br>
>> >> >><br>
>> >> >> Rayson<br>
>> >> >><br>
>> >> >> ==================================================<br>
>> >> >> Open Grid Scheduler - The Official Open Source Grid Engine<br>
>> >> >> <a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
>> >> >> <a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br>
>> >> >><br>
>> >> >><br>
>> >> >> On Fri, Dec 27, 2013 at 6:27 PM, Lyn Gerner<br>
>> >> >> <<a href="mailto:schedulerqueen@gmail.com">schedulerqueen@gmail.com</a>><br>
>> >> >> wrote:<br>
>> >> >> > My bad:<br>
>> >> >> ><br>
>> >> >> > root@AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs<br>
>> >> >> > # /opt/sge6/utilbin/linux-x64/gethostname -all<br>
>> >> >> > error resolving local host: can't resolve host name (h_errno =<br>
>> >> >> > HOST_NOT_FOUND)<br>
>> >> >> ><br>
>> >> >> > Thanks for any insights,<br>
>> >> >> > Lyn<br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> > On Fri, Dec 27, 2013 at 1:25 PM, Rayson Ho <<a href="mailto:raysonlogin@gmail.com">raysonlogin@gmail.com</a>><br>
>> >> >> > wrote:<br>
>> >> >> >><br>
>> >> >> >> But I need the output of "gethostname", not "gethostbyname"...<br>
>> >> >> >> :-P<br>
>> >> >> >><br>
>> >> >> >> Rayson<br>
>> >> >> >><br>
>> >> >> >> ==================================================<br>
>> >> >> >> Open Grid Scheduler - The Official Open Source Grid Engine<br>
>> >> >> >> <a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
>> >> >> >><br>
>> >> >> >> <a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br>
>> >> >> >><br>
>> >> >> >><br>
>> >> >> >> On Fri, Dec 27, 2013 at 6:11 PM, Lyn Gerner<br>
>> >> >> >> <<a href="mailto:schedulerqueen@gmail.com">schedulerqueen@gmail.com</a>><br>
>> >> >> >> wrote:<br>
>> >> >> >> > Thanks for the quick response, Rayson. Output from<br>
>> >> >> >> > gethostbyname<br>
>> >> >> >> > is<br>
>> >> >> >> > in<br>
>> >> >> >> > between the ****s below:<br>
>> >> >> >> ><br>
>> >> >> >> > On Fri, Dec 27, 2013 at 1:04 PM, Rayson Ho<br>
>> >> >> >> > <<a href="mailto:raysonlogin@gmail.com">raysonlogin@gmail.com</a>><br>
>> >> >> >> > wrote:<br>
>> >> >> >> >><br>
>> >> >> >> >> What is the output of "gethostname"? (gethostname is shipped<br>
>> >> >> >> >> with<br>
>> >> >> >> >> SGE<br>
>> >> >> >> >> in the util dir.)<br>
>> >> >> >> >><br>
>> >> >> >> >> Rayson<br>
>> >> >> >> >><br>
>> >> >> >> >> ==================================================<br>
>> >> >> >> >> Open Grid Scheduler - The Official Open Source Grid Engine<br>
>> >> >> >> >> <a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
>> >> >> >> >><br>
>> >> >> >> >><br>
>> >> >> >> >> <a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br>
>> >> >> >> >><br>
>> >> >> >> >><br>
>> >> >> >> >> On Fri, Dec 27, 2013 at 5:34 PM, Lyn Gerner<br>
>> >> >> >> >> <<a href="mailto:schedulerqueen@gmail.com">schedulerqueen@gmail.com</a>><br>
>> >> >> >> >> wrote:<br>
>> >> >> >> >> > Hi All,<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Okay, I'm in the Twilight Zone now. After starting a small<br>
>> >> >> >> >> > cluster<br>
>> >> >> >> >> > on<br>
>> >> >> >> >> > the<br>
>> >> >> >> >> > 23rd, and doing minimal reconfig (qmod -d) to disable the<br>
>> >> >> >> >> > sge_execd<br>
>> >> >> >> >> > on<br>
>> >> >> >> >> > the<br>
>> >> >> >> >> > master and qconf -mq all.q to change some slot counts -- all<br>
>> >> >> >> >> > of<br>
>> >> >> >> >> > which<br>
>> >> >> >> >> > worked<br>
>> >> >> >> >> > fine -- I come back these days later to find an unusable SGE<br>
>> >> >> >> >> > config:<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > root@AWS-VTMXmaster-w2b ~<br>
>> >> >> >> >> > # qstat -f<br>
>> >> >> >> >> > error: sge_gethostbyname failed<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > /etc/hosts is correct for all its (internal) host addrs:<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > root@AWS-VTMXmaster-w2b ~<br>
>> >> >> >> >> > # cat /etc/hosts<br>
>> >> >> >> >> > 127.0.0.1 localhost localhost.localdomain localhost4<br>
>> >> >> >> >> > localhost4.localdomain4<br>
>> >> >> >> >> > ::1 localhost localhost.localdomain localhost6<br>
>> >> >> >> >> > localhost6.localdomain6<br>
>> >> >> >> >> > 10.250.65.204 master<br>
>> >> >> >> >> > 10.251.30.12 node001<br>
>> >> >> >> >> ><br>
>> >> >> >> >> *****<br>
>> >> >> >> >><br>
>> >> >> >> >> > The gethostbyname utility works correctly (so does<br>
>> >> >> >> >> > gethostbyaddr):<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > root@AWS-VTMXmaster-w2b<br>
>> >> >> >> >> > /opt/sge6/default/common/install_logs<br>
>> >> >> >> >> > # /opt/sge6/utilbin/linux-x64/gethostbyname master<br>
>> >> >> >> >> > Hostname: master<br>
>> >> >> >> >> > Aliases:<br>
>> >> >> >> >> > Host Address(es): 10.250.65.204<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > root@AWS-VTMXmaster-w2b<br>
>> >> >> >> >> > /opt/sge6/default/common/install_logs<br>
>> >> >> >> >> > # /opt/sge6/utilbin/linux-x64/gethostbyname node001<br>
>> >> >> >> >> > Hostname: node001<br>
>> >> >> >> >> > Aliases:<br>
>> >> >> >> >> > Host Address(es): 10.251.30.12<br>
>> >> >> >> ><br>
>> >> >> >> ><br>
>> >> >> >> > ******<br>
>> >> >> >> >><br>
>> >> >> >> >> ><br>
>> >> >> >> >> > root@AWS-VTMXmaster-w2b<br>
>> >> >> >> >> > /opt/sge6/default/common/install_logs<br>
>> >> >> >> >> > # qstat -f<br>
>> >> >> >> >> > error: sge_gethostbyname failed<br>
>> >> >> >> >> ><br>
>> >> >> >> >> ><br>
>> >> >> >> >> > I went so far as to edit the hostname in<br>
>> >> >> >> >> > /etc/sysconfig/network<br>
>> >> >> >> >> > to<br>
>> >> >> >> >> > contain<br>
>> >> >> >> >> > "master" and "node001" on the two nodes. Same error.<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > I have been all over the 'net looking for solutions, but<br>
>> >> >> >> >> > have<br>
>> >> >> >> >> > found<br>
>> >> >> >> >> > nothing<br>
>> >> >> >> >> > with a clear resolution. <a href="http://gridengine.sunsource.net" target="_blank">gridengine.sunsource.net</a> is gone.<br>
>> >> >> >> >> > The<br>
>> >> >> >> >> > follow-on<br>
>> >> >> >> >> > at <a href="http://gridengine.org/pipermail/users/" target="_blank">http://gridengine.org/pipermail/users/</a> doesn't seem to be<br>
>> >> >> >> >> > searchable,<br>
>> >> >> >> >> > except on an onerous, month-by-month click-thru basis (which<br>
>> >> >> >> >> > hasn't<br>
>> >> >> >> >> > yielded<br>
>> >> >> >> >> > anything useful as I slog thru it).<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Short of starcluster restart'ing, I'll appreciate anyone's<br>
>> >> >> >> >> > inputs<br>
>> >> >> >> >> > on<br>
>> >> >> >> >> > what to<br>
>> >> >> >> >> > try next.<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Thanks much,<br>
>> >> >> >> >> > Lyn<br>
>> >> >> >> >> ><br>
>> >> >> >> >> ><br>
>> >> >> >> >> > _______________________________________________<br>
>> >> >> >> >> > StarCluster mailing list<br>
>> >> >> >> >> > <a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
>> >> >> >> >> > <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
>> >> >> >> >> ><br>
>> >> >> >> ><br>
>> >> >> >> ><br>
>> >> >> ><br>
>> >> >> ><br>
>> >> ><br>
>> >> ><br>
>> ><br>
>> ><br>
><br>
><br>
</div></div></blockquote></div><br></div>