[StarCluster] Possible NFS setup error when adding new nodes to a cluster?

Wed Jan 18 17:17:33 EST 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Paul,

No problem at all and thanks for the kind words. From my limited
testing I believe this is fixed in the latest github code which will
be included in tomorrow's patch release:

http://tinyurl.com/8axmckc

If you could test the latest github code and report back whether it
fixes the issue for you or not that'd be very helpful.

~Justin

On 01/18/2012 03:44 PM, Paul Koerbitz wrote:
> Hi Justin,
> 
> thanks for the fast response and the great work. I thought about
> taking a crack at a fix myself, but Im not familiar with the
> codebase and don't have very little time right now.
> 
> thanks Paul
> 
> On Wed, Jan 18, 2012 at 21:33, Justin Riley <jtriley at mit.edu 
> <mailto:jtriley at mit.edu>> wrote:
> 
> Hi Paul,
> 
> I just tested for myself and I can confirm that /etc/exports is
> indeed being clobbered when running the 'addnode' command. I'm
> working on a patch release to fix this and other minor things.
> Should be out tomorrow.
> 
> Thanks for reporting!
> 
> ~Justin
> 
> On 01/18/2012 02:08 PM, Paul Koerbitz wrote:
>> Dear starcluster team,
> 
>> I tripped over what might be an error with the NFS setup when 
>> adding new nodes to a cluster.
> 
>> I set up my cluster with initially one root node only and then 
>> first added one node and subsequently 4 more nodes. I noticed
>> that my ebsvolume wasn't getting mounted correctly on the nodes,
>> calling 'df' reported 'stale filehandle' for /home /opt/sge6 and
>> /data
> 
>> My impression is that as nodes get added, the /etc/exports file 
>> which is responsible for allowing NFS access gets overwritten. 
>> Therefore only the last added node can access the shared file 
>> systems.
> 
>> Here is how I resloved the issue. First I unmounted all the 
>> volumes:
> 
>> root at node001:~# umount -f /data
> 
>> At this point remounting doesn't work:
> 
>> root at node001:~# mount -t nfs master:/data /data
> 
>> mount.nfs: access denied by server while mounting master:/data
> 
> 
>> I then edited /etc/exports on the master node. Here only the
>> last node was listed:
> 
>> /home node005(async,no_root_squash,no_subtree_check,rw)
>> /opt/sge6 node005(async,no_root_squash,no_subtree_check,rw)
>> /data node005(async,no_root_squash,no_subtree_check,rw)
> 
>> I changed this to /home 
>> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>> node001(async,no_root_squash,no_subtree_check,rw) /data 
>> node001(async,no_root_squash,no_subtree_check,rw) /home 
>> node002(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>> node002(async,no_root_squash,no_subtree_check,rw) /data 
>> node002(async,no_root_squash,no_subtree_check,rw) /home 
>> node003(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>> node003(async,no_root_squash,no_subtree_check,rw) /data 
>> node003(async,no_root_squash,no_subtree_check,rw) /home 
>> node004(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>> node004(async,no_root_squash,no_subtree_check,rw) /data 
>> node004(async,no_root_squash,no_subtree_check,rw) /home 
>> node005(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>> node005(async,no_root_squash,no_subtree_check,rw) /data 
>> node005(async,no_root_squash,no_subtree_check,rw)
> 
>> then restart the nfs-server:
> 
>> $ /etc/init.d/nfs-kernel-server restart
> 
>> After that running 'df' on each node showed the nfs now working 
>> correctly.
> 
>> kind regards Paul
> 
> 
>> This body part will be downloaded on demand.
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8XRPwACgkQ4llAkMfDcrlJWACgjNwy6KVMywbiP6aVggOgQVqm
OD8AnA/1fwt04oGIhEtA7i3kq8KLMr0y
=9mnL
-----END PGP SIGNATURE-----