[StarCluster] Possible NFS setup error when adding new nodes to a cluster?
Justin Riley
jtriley at MIT.EDU
Thu Jan 19 10:34:13 EST 2012
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Paul,
Awesome, thanks for testing! I'm cleaning up the code for a patch
release. Should be out this afternoon. I'll send out an announcement
later today. Stay tuned.
~Justin
On 01/19/2012 04:33 AM, Paul Koerbitz wrote:
> Hello Justin,
>
> I checked out the develop branch from the link you send me and can
> confirm this is fixed. I started a cluster with one master node,
> then added one node and another node. /etc/exports is not clobbered
> anymore and everything NFS-wise seems to work.
>
> cheers Paul
>
> Here is how the /etc/exports file looked like:
>
> root at master:/data# cat /etc/exports # /etc/exports: the access
> control list for filesystems which may be exported #to NFS clients.
> See exports(5). # # Example for NFSv2 and NFSv3: # /srv/homes
> hostname1(rw,sync,no_subtree_check)
> hostname2(ro,sync,no_subtree_check) # # Example for NFSv4: #
> /srv/nfs4
> gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) #
> /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check) #
>
> root at master:/data# cat /etc/exports # /etc/exports: the access
> control list for filesystems which may be exported #to NFS clients.
> See exports(5). # # Example for NFSv2 and NFSv3: # /srv/homes
> hostname1(rw,sync,no_subtree_check)
> hostname2(ro,sync,no_subtree_check) # # Example for NFSv4: #
> /srv/nfs4
> gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) #
> /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check) # /home
> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6
> node001(async,no_root_squash,no_subtree_check,rw) /data
> node001(async,no_root_squash,no_subtree_check,rw)
>
> root at master:/data# cat /etc/exports # /etc/exports: the access
> control list for filesystems which may be exported #to NFS clients.
> See exports(5). # # Example for NFSv2 and NFSv3: # /srv/homes
> hostname1(rw,sync,no_subtree_check)
> hostname2(ro,sync,no_subtree_check) # # Example for NFSv4: #
> /srv/nfs4
> gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) #
> /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check) # /home
> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6
> node001(async,no_root_squash,no_subtree_check,rw) /data
> node001(async,no_root_squash,no_subtree_check,rw) /home
> node002(async,no_root_squash,no_subtree_check,rw) /opt/sge6
> node002(async,no_root_squash,no_subtree_check,rw) /data
> node002(async,no_root_squash,no_subtree_check,rw)
>
> On Wed, Jan 18, 2012 at 23:45, Paul Koerbitz
> <paul.koerbitz at gmail.com <mailto:paul.koerbitz at gmail.com>> wrote:
>
> Hi Justin,
>
> ok great. I have something running right now that I don't want to
> interrupt, but I might be able to take a stab at it tomorrow and
> will report back then.
>
> cheers Paul
>
> On Wed, Jan 18, 2012 at 23:17, Justin Riley <jtriley at mit.edu
> <mailto:jtriley at mit.edu>> wrote:
>
> Hi Paul,
>
> No problem at all and thanks for the kind words. From my limited
> testing I believe this is fixed in the latest github code which
> will be included in tomorrow's patch release:
>
> http://tinyurl.com/8axmckc
>
> If you could test the latest github code and report back whether
> it fixes the issue for you or not that'd be very helpful.
>
> ~Justin
>
> On 01/18/2012 03:44 PM, Paul Koerbitz wrote:
>> Hi Justin,
>
>> thanks for the fast response and the great work. I thought about
>> taking a crack at a fix myself, but Im not familiar with the
>> codebase and don't have very little time right now.
>
>> thanks Paul
>
>> On Wed, Jan 18, 2012 at 21:33, Justin Riley <jtriley at mit.edu
> <mailto:jtriley at mit.edu>
>> <mailto:jtriley at mit.edu <mailto:jtriley at mit.edu>>> wrote:
>
>> Hi Paul,
>
>> I just tested for myself and I can confirm that /etc/exports is
>> indeed being clobbered when running the 'addnode' command. I'm
>> working on a patch release to fix this and other minor things.
>> Should be out tomorrow.
>
>> Thanks for reporting!
>
>> ~Justin
>
>> On 01/18/2012 02:08 PM, Paul Koerbitz wrote:
>>> Dear starcluster team,
>
>>> I tripped over what might be an error with the NFS setup when
>>> adding new nodes to a cluster.
>
>>> I set up my cluster with initially one root node only and then
>>> first added one node and subsequently 4 more nodes. I noticed
>>> that my ebsvolume wasn't getting mounted correctly on the
>>> nodes, calling 'df' reported 'stale filehandle' for /home
>>> /opt/sge6 and /data
>
>>> My impression is that as nodes get added, the /etc/exports
>>> file which is responsible for allowing NFS access gets
>>> overwritten. Therefore only the last added node can access the
>>> shared file systems.
>
>>> Here is how I resloved the issue. First I unmounted all the
>>> volumes:
>
>>> root at node001:~# umount -f /data
>
>>> At this point remounting doesn't work:
>
>>> root at node001:~# mount -t nfs master:/data /data
>
>>> mount.nfs: access denied by server while mounting master:/data
>
>
>>> I then edited /etc/exports on the master node. Here only the
>>> last node was listed:
>
>>> /home node005(async,no_root_squash,no_subtree_check,rw)
>>> /opt/sge6 node005(async,no_root_squash,no_subtree_check,rw)
>>> /data node005(async,no_root_squash,no_subtree_check,rw)
>
>>> I changed this to /home
>>> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>>> node001(async,no_root_squash,no_subtree_check,rw) /data
>>> node001(async,no_root_squash,no_subtree_check,rw) /home
>>> node002(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>>> node002(async,no_root_squash,no_subtree_check,rw) /data
>>> node002(async,no_root_squash,no_subtree_check,rw) /home
>>> node003(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>>> node003(async,no_root_squash,no_subtree_check,rw) /data
>>> node003(async,no_root_squash,no_subtree_check,rw) /home
>>> node004(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>>> node004(async,no_root_squash,no_subtree_check,rw) /data
>>> node004(async,no_root_squash,no_subtree_check,rw) /home
>>> node005(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>>> node005(async,no_root_squash,no_subtree_check,rw) /data
>>> node005(async,no_root_squash,no_subtree_check,rw)
>
>>> then restart the nfs-server:
>
>>> $ /etc/init.d/nfs-kernel-server restart
>
>>> After that running 'df' on each node showed the nfs now
>>> working correctly.
>
>>> kind regards Paul
>
>
>>> This body part will be downloaded on demand.
>
>
>
>
> _______________________________________________ StarCluster mailing
> list StarCluster at mit.edu <mailto:StarCluster at mit.edu>
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk8YN/MACgkQ4llAkMfDcrmhzQCeLUJ2brw3x/9UN7G08mPsmDI4
ku0An0RYgyCPmlXECVa2bneP5a502a42
=bwFL
-----END PGP SIGNATURE-----
More information about the StarCluster
mailing list