[StarCluster] SGE issue with hostnames

"Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." laotsao at gmail.com
Tue Jul 26 08:37:12 EDT 2011


in the $SGE_ROOT/$SGE_CELL/common
create host_aliases
< hostname>.privatenet <hostnam>.pubnet


On 7/25/2011 11:25 PM, Robert Tomkiewicz wrote:
> Hi there,
>
> I started a 4-node EC2 cluster using 0.92rc2, and ami-a5c42dcc, 
> standard starcluster 9.04 x64 ami.
>
> I ran into the following issue while doing some basic sge setup.  At 
> first qconf worked fine, then a few minutes later...
>
> root at master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qconf -sq all.q
> error: commlib error: access denied (client IP resolved to host name 
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to 
> clients host name "master").
>
> after issuing
>
>  root at master: ~# hostname master
>
> I was able to proceed normally, and launch my sge jobs.  They were 
> running normally, confirmed by the output of qstat.
>
> However, some minutes later, when checking on them with another qstat, 
> I got the same thing again.
>
> root at master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name 
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to 
> clients host name "master").
>
> resetting the hostname was to no avail.
>
> root at master: ~ # hostname
> master
> root at master: ~ # hostname master
> root at master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name 
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to 
> clients host name "master").
>
> So I tried this
>
>  root at master: ~# hostname domU-12-31-39-09-80-C1.compute-1.internal
>
> which yielded, vice versa...
>
> root at master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name 
> "master". This is not identical to clients host name 
> "domU-12-31-39-09-80-C1.compute-1.internal")
>
> Setting the hostname back to master "hostname master")  at this point 
> yields correct operation for a few minutes.
>
>
> It seems clear the problem has to do with doubled hostnames, but where 
> are they set?  Has anyone else had a similar problem?
>
> Thank you,
>
> Robert Tomkiewicz
>
>
>
> /etc/hostname is simply
>
> master
>
>
> /etc/hosts is below:
>
> 127.0.0.1 localhost
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> 10.210.135.47 master
> 10.66.83.219 node001
> 10.193.155.175 node002
> 10.206.70.15 node003
>
>
>
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20110726/a574c6e6/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 642 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/starcluster/attachments/20110726/a574c6e6/attachment.vcf


More information about the StarCluster mailing list