[StarCluster] Possible NFS setup error when adding new nodes to a cluster?
Justin Riley
jtriley at MIT.EDU
Wed Jan 18 17:17:33 EST 2012
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Paul,
No problem at all and thanks for the kind words. From my limited
testing I believe this is fixed in the latest github code which will
be included in tomorrow's patch release:
http://tinyurl.com/8axmckc
If you could test the latest github code and report back whether it
fixes the issue for you or not that'd be very helpful.
~Justin
On 01/18/2012 03:44 PM, Paul Koerbitz wrote:
> Hi Justin,
>
> thanks for the fast response and the great work. I thought about
> taking a crack at a fix myself, but Im not familiar with the
> codebase and don't have very little time right now.
>
> thanks Paul
>
> On Wed, Jan 18, 2012 at 21:33, Justin Riley <jtriley at mit.edu
> <mailto:jtriley at mit.edu>> wrote:
>
> Hi Paul,
>
> I just tested for myself and I can confirm that /etc/exports is
> indeed being clobbered when running the 'addnode' command. I'm
> working on a patch release to fix this and other minor things.
> Should be out tomorrow.
>
> Thanks for reporting!
>
> ~Justin
>
> On 01/18/2012 02:08 PM, Paul Koerbitz wrote:
>> Dear starcluster team,
>
>> I tripped over what might be an error with the NFS setup when
>> adding new nodes to a cluster.
>
>> I set up my cluster with initially one root node only and then
>> first added one node and subsequently 4 more nodes. I noticed
>> that my ebsvolume wasn't getting mounted correctly on the nodes,
>> calling 'df' reported 'stale filehandle' for /home /opt/sge6 and
>> /data
>
>> My impression is that as nodes get added, the /etc/exports file
>> which is responsible for allowing NFS access gets overwritten.
>> Therefore only the last added node can access the shared file
>> systems.
>
>> Here is how I resloved the issue. First I unmounted all the
>> volumes:
>
>> root at node001:~# umount -f /data
>
>> At this point remounting doesn't work:
>
>> root at node001:~# mount -t nfs master:/data /data
>
>> mount.nfs: access denied by server while mounting master:/data
>
>
>> I then edited /etc/exports on the master node. Here only the
>> last node was listed:
>
>> /home node005(async,no_root_squash,no_subtree_check,rw)
>> /opt/sge6 node005(async,no_root_squash,no_subtree_check,rw)
>> /data node005(async,no_root_squash,no_subtree_check,rw)
>
>> I changed this to /home
>> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> node001(async,no_root_squash,no_subtree_check,rw) /data
>> node001(async,no_root_squash,no_subtree_check,rw) /home
>> node002(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> node002(async,no_root_squash,no_subtree_check,rw) /data
>> node002(async,no_root_squash,no_subtree_check,rw) /home
>> node003(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> node003(async,no_root_squash,no_subtree_check,rw) /data
>> node003(async,no_root_squash,no_subtree_check,rw) /home
>> node004(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> node004(async,no_root_squash,no_subtree_check,rw) /data
>> node004(async,no_root_squash,no_subtree_check,rw) /home
>> node005(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> node005(async,no_root_squash,no_subtree_check,rw) /data
>> node005(async,no_root_squash,no_subtree_check,rw)
>
>> then restart the nfs-server:
>
>> $ /etc/init.d/nfs-kernel-server restart
>
>> After that running 'df' on each node showed the nfs now working
>> correctly.
>
>> kind regards Paul
>
>
>> This body part will be downloaded on demand.
>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk8XRPwACgkQ4llAkMfDcrlJWACgjNwy6KVMywbiP6aVggOgQVqm
OD8AnA/1fwt04oGIhEtA7i3kq8KLMr0y
=9mnL
-----END PGP SIGNATURE-----
More information about the StarCluster
mailing list