[StarCluster] Twilight Zone: sge_gethostbyname failed

Rayson Ho raysonlogin at gmail.com
Fri Dec 27 18:04:38 EST 2013


What is the output of "gethostname"? (gethostname is shipped with SGE
in the util dir.)

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html


On Fri, Dec 27, 2013 at 5:34 PM, Lyn Gerner <schedulerqueen at gmail.com> wrote:
> Hi All,
>
> Okay, I'm in the Twilight Zone now.  After starting a small cluster on the
> 23rd, and doing minimal reconfig (qmod -d) to disable the sge_execd on the
> master and qconf -mq all.q to change some slot counts -- all of which worked
> fine -- I come back these days later to find an unusable SGE config:
>
> root at AWS-VTMXmaster-w2b ~
> # qstat -f
> error: sge_gethostbyname failed
>
> /etc/hosts is correct for all its (internal) host addrs:
>
> root at AWS-VTMXmaster-w2b ~
> # cat /etc/hosts
> 127.0.0.1   localhost localhost.localdomain localhost4
> localhost4.localdomain4
> ::1         localhost localhost.localdomain localhost6
> localhost6.localdomain6
> 10.250.65.204 master
> 10.251.30.12 node001
>
>
> The gethostbyname utility works correctly (so does gethostbyaddr):
>
> root at AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs
> # /opt/sge6/utilbin/linux-x64/gethostbyname master
> Hostname: master
> Aliases:
> Host Address(es): 10.250.65.204
>
> root at AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs
> # /opt/sge6/utilbin/linux-x64/gethostbyname node001
> Hostname: node001
> Aliases:
> Host Address(es): 10.251.30.12
>
> root at AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs
> # qstat -f
> error: sge_gethostbyname failed
>
>
> I went so far as to edit the hostname in /etc/sysconfig/network to contain
> "master" and "node001" on the two nodes.  Same error.
>
> I have been all over the 'net looking for solutions, but have found nothing
> with a clear resolution.  gridengine.sunsource.net is gone.  The follow-on
> at http://gridengine.org/pipermail/users/ doesn't seem to be searchable,
> except on an onerous, month-by-month click-thru basis (which hasn't yielded
> anything useful as I slog thru it).
>
> Short of starcluster restart'ing, I'll appreciate anyone's inputs on what to
> try next.
>
> Thanks much,
> Lyn
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>


More information about the StarCluster mailing list