[StarCluster] Twilight Zone: sge_gethostbyname failed
Lyn Gerner
schedulerqueen at gmail.com
Fri Dec 27 19:47:27 EST 2013
Yep, it works again with those changes.
So, how should I stop the regression in a non-kludgy way?
Thanks again,
Lyn
On Fri, Dec 27, 2013 at 2:43 PM, Rayson Ho <raysonlogin at gmail.com> wrote:
> /etc/sysconfig/network is read during reboot, and may be after DHCP...
>
> To see if it is the issue, set HOSTNAME back to master, and also run
> "hostname master" as root.
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>
>
> On Fri, Dec 27, 2013 at 7:40 PM, Lyn Gerner <schedulerqueen at gmail.com>
> wrote:
> > Thanks for digging, Rayson.
> >
> > So, /etc/sysconfig/network had HOSTNAME=centos-ami when the problem first
> > occurred. I tried resetting it to "master" and then retried the SGE
> > commands (qstat, qsub, etc.). They still failed with the same error at
> that
> > point, so I switched them back, not knowing for sure if they'd been set
> to
> > master and node001 to begin with.
> >
> > Thanks,
> > Lyn
> >
> >
> > On Fri, Dec 27, 2013 at 2:35 PM, Rayson Ho <raysonlogin at gmail.com>
> wrote:
> >>
> >> (Updating the list...)
> >>
> >> The hostname on the master gets reset to centos-ami, which is not
> >> resolvable. Thus Grid Engine complains about the hostname issue.
> >>
> >> Lyn: what is the value of the HOSTNAME key in "/etc/sysconfig/network"
> >> on your master instance??
> >>
> >> Justin & other devs: set_hostname() in node.py works on Ubuntu because
> >> Ubuntu uses /etc/hostname, but RHEL (and RHEL-based distros like
> >> CentOS, Oracle Linux, Scientific Linux) uses /etc/sysconfig/network,
> >> and yet SuSE uses /etc/HOSTNAME!
> >>
> >> Rayson
> >>
> >> ==================================================
> >> Open Grid Scheduler - The Official Open Source Grid Engine
> >> http://gridscheduler.sourceforge.net/
> >> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> >>
> >>
> >> On Fri, Dec 27, 2013 at 6:39 PM, Lyn Gerner <schedulerqueen at gmail.com>
> >> wrote:
> >> > I used the Scientific Linux AMI (been a long time, but I found it from
> >> > the
> >> > SC site), and 0.94.3 is my SC version.
> >> >
> >> >
> >> > On Fri, Dec 27, 2013 at 1:36 PM, Rayson Ho <raysonlogin at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hmm, which AMI did you use, and what's the version of SC?
> >> >>
> >> >> Rayson
> >> >>
> >> >> ==================================================
> >> >> Open Grid Scheduler - The Official Open Source Grid Engine
> >> >> http://gridscheduler.sourceforge.net/
> >> >> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> >> >>
> >> >>
> >> >> On Fri, Dec 27, 2013 at 6:33 PM, Lyn Gerner <
> schedulerqueen at gmail.com>
> >> >> wrote:
> >> >> > root at AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs
> >> >> > # /opt/sge6/utilbin/linux-x64/gethostname -name
> >> >> > error resolving local host: can't resolve host name (h_errno =
> >> >> > HOST_NOT_FOUND)
> >> >> >
> >> >> > root at AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs
> >> >> > # hostname
> >> >> > centos-ami
> >> >> >
> >> >> > root at AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs
> >> >> > # hostname -f
> >> >> > hostname: Unknown host
> >> >> >
> >> >> > What's weird is that I have never mucked with any of this under
> >> >> > StarCluster,
> >> >> > and have only recently started having problems. Can't pinpoint any
> >> >> > specific
> >> >> > event or thing that changed--except that I started leaving the
> config
> >> >> > up
> >> >> > for
> >> >> > days instead of hours at a stretch.
> >> >> >
> >> >> > Thanks,
> >> >> > Lyn
> >> >> >
> >> >> >
> >> >> > On Fri, Dec 27, 2013 at 1:30 PM, Rayson Ho <raysonlogin at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> No problem, and I think that's why it is failing. Can you also
> send
> >> >> >> me
> >> >> >> the output of:
> >> >> >>
> >> >> >> 1) gethostname -name
> >> >> >>
> >> >> >> 2) hostname
> >> >> >>
> >> >> >> 3) hostname -f
> >> >> >>
> >> >> >> Rayson
> >> >> >>
> >> >> >> ==================================================
> >> >> >> Open Grid Scheduler - The Official Open Source Grid Engine
> >> >> >> http://gridscheduler.sourceforge.net/
> >> >> >>
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> >> >> >>
> >> >> >>
> >> >> >> On Fri, Dec 27, 2013 at 6:27 PM, Lyn Gerner
> >> >> >> <schedulerqueen at gmail.com>
> >> >> >> wrote:
> >> >> >> > My bad:
> >> >> >> >
> >> >> >> > root at AWS-VTMXmaster-w2b /opt/sge6/default/common/install_logs
> >> >> >> > # /opt/sge6/utilbin/linux-x64/gethostname -all
> >> >> >> > error resolving local host: can't resolve host name (h_errno =
> >> >> >> > HOST_NOT_FOUND)
> >> >> >> >
> >> >> >> > Thanks for any insights,
> >> >> >> > Lyn
> >> >> >> >
> >> >> >> >
> >> >> >> > On Fri, Dec 27, 2013 at 1:25 PM, Rayson Ho <
> raysonlogin at gmail.com>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> But I need the output of "gethostname", not "gethostbyname"...
> >> >> >> >> :-P
> >> >> >> >>
> >> >> >> >> Rayson
> >> >> >> >>
> >> >> >> >> ==================================================
> >> >> >> >> Open Grid Scheduler - The Official Open Source Grid Engine
> >> >> >> >> http://gridscheduler.sourceforge.net/
> >> >> >> >>
> >> >> >> >>
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Fri, Dec 27, 2013 at 6:11 PM, Lyn Gerner
> >> >> >> >> <schedulerqueen at gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> > Thanks for the quick response, Rayson. Output from
> >> >> >> >> > gethostbyname
> >> >> >> >> > is
> >> >> >> >> > in
> >> >> >> >> > between the ****s below:
> >> >> >> >> >
> >> >> >> >> > On Fri, Dec 27, 2013 at 1:04 PM, Rayson Ho
> >> >> >> >> > <raysonlogin at gmail.com>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> What is the output of "gethostname"? (gethostname is shipped
> >> >> >> >> >> with
> >> >> >> >> >> SGE
> >> >> >> >> >> in the util dir.)
> >> >> >> >> >>
> >> >> >> >> >> Rayson
> >> >> >> >> >>
> >> >> >> >> >> ==================================================
> >> >> >> >> >> Open Grid Scheduler - The Official Open Source Grid Engine
> >> >> >> >> >> http://gridscheduler.sourceforge.net/
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Dec 27, 2013 at 5:34 PM, Lyn Gerner
> >> >> >> >> >> <schedulerqueen at gmail.com>
> >> >> >> >> >> wrote:
> >> >> >> >> >> > Hi All,
> >> >> >> >> >> >
> >> >> >> >> >> > Okay, I'm in the Twilight Zone now. After starting a
> small
> >> >> >> >> >> > cluster
> >> >> >> >> >> > on
> >> >> >> >> >> > the
> >> >> >> >> >> > 23rd, and doing minimal reconfig (qmod -d) to disable the
> >> >> >> >> >> > sge_execd
> >> >> >> >> >> > on
> >> >> >> >> >> > the
> >> >> >> >> >> > master and qconf -mq all.q to change some slot counts --
> all
> >> >> >> >> >> > of
> >> >> >> >> >> > which
> >> >> >> >> >> > worked
> >> >> >> >> >> > fine -- I come back these days later to find an unusable
> SGE
> >> >> >> >> >> > config:
> >> >> >> >> >> >
> >> >> >> >> >> > root at AWS-VTMXmaster-w2b ~
> >> >> >> >> >> > # qstat -f
> >> >> >> >> >> > error: sge_gethostbyname failed
> >> >> >> >> >> >
> >> >> >> >> >> > /etc/hosts is correct for all its (internal) host addrs:
> >> >> >> >> >> >
> >> >> >> >> >> > root at AWS-VTMXmaster-w2b ~
> >> >> >> >> >> > # cat /etc/hosts
> >> >> >> >> >> > 127.0.0.1 localhost localhost.localdomain localhost4
> >> >> >> >> >> > localhost4.localdomain4
> >> >> >> >> >> > ::1 localhost localhost.localdomain localhost6
> >> >> >> >> >> > localhost6.localdomain6
> >> >> >> >> >> > 10.250.65.204 master
> >> >> >> >> >> > 10.251.30.12 node001
> >> >> >> >> >> >
> >> >> >> >> >> *****
> >> >> >> >> >>
> >> >> >> >> >> > The gethostbyname utility works correctly (so does
> >> >> >> >> >> > gethostbyaddr):
> >> >> >> >> >> >
> >> >> >> >> >> > root at AWS-VTMXmaster-w2b
> >> >> >> >> >> > /opt/sge6/default/common/install_logs
> >> >> >> >> >> > # /opt/sge6/utilbin/linux-x64/gethostbyname master
> >> >> >> >> >> > Hostname: master
> >> >> >> >> >> > Aliases:
> >> >> >> >> >> > Host Address(es): 10.250.65.204
> >> >> >> >> >> >
> >> >> >> >> >> > root at AWS-VTMXmaster-w2b
> >> >> >> >> >> > /opt/sge6/default/common/install_logs
> >> >> >> >> >> > # /opt/sge6/utilbin/linux-x64/gethostbyname node001
> >> >> >> >> >> > Hostname: node001
> >> >> >> >> >> > Aliases:
> >> >> >> >> >> > Host Address(es): 10.251.30.12
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > ******
> >> >> >> >> >>
> >> >> >> >> >> >
> >> >> >> >> >> > root at AWS-VTMXmaster-w2b
> >> >> >> >> >> > /opt/sge6/default/common/install_logs
> >> >> >> >> >> > # qstat -f
> >> >> >> >> >> > error: sge_gethostbyname failed
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > I went so far as to edit the hostname in
> >> >> >> >> >> > /etc/sysconfig/network
> >> >> >> >> >> > to
> >> >> >> >> >> > contain
> >> >> >> >> >> > "master" and "node001" on the two nodes. Same error.
> >> >> >> >> >> >
> >> >> >> >> >> > I have been all over the 'net looking for solutions, but
> >> >> >> >> >> > have
> >> >> >> >> >> > found
> >> >> >> >> >> > nothing
> >> >> >> >> >> > with a clear resolution. gridengine.sunsource.net is
> gone.
> >> >> >> >> >> > The
> >> >> >> >> >> > follow-on
> >> >> >> >> >> > at http://gridengine.org/pipermail/users/ doesn't seem
> to be
> >> >> >> >> >> > searchable,
> >> >> >> >> >> > except on an onerous, month-by-month click-thru basis
> (which
> >> >> >> >> >> > hasn't
> >> >> >> >> >> > yielded
> >> >> >> >> >> > anything useful as I slog thru it).
> >> >> >> >> >> >
> >> >> >> >> >> > Short of starcluster restart'ing, I'll appreciate anyone's
> >> >> >> >> >> > inputs
> >> >> >> >> >> > on
> >> >> >> >> >> > what to
> >> >> >> >> >> > try next.
> >> >> >> >> >> >
> >> >> >> >> >> > Thanks much,
> >> >> >> >> >> > Lyn
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > _______________________________________________
> >> >> >> >> >> > StarCluster mailing list
> >> >> >> >> >> > StarCluster at mit.edu
> >> >> >> >> >> > http://mailman.mit.edu/mailman/listinfo/starcluster
> >> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20131227/a2529d8d/attachment-0001.htm
More information about the StarCluster
mailing list