[StarCluster] Possible NFS setup error when adding new nodes to a cluster?

Justin Riley jtriley at MIT.EDU
Thu Jan 19 10:34:13 EST 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Paul,

Awesome, thanks for testing! I'm cleaning up the code for a patch
release. Should be out this afternoon. I'll send out an announcement
later today. Stay tuned.

~Justin

On 01/19/2012 04:33 AM, Paul Koerbitz wrote:
> Hello Justin,
> 
> I checked out the develop branch from the link you send me and can 
> confirm this is fixed. I started a cluster with one master node,
> then added one node and another node. /etc/exports is not clobbered
> anymore and everything NFS-wise seems to work.
> 
> cheers Paul
> 
> Here is how the /etc/exports file looked like:
> 
> root at master:/data# cat /etc/exports # /etc/exports: the access
> control list for filesystems which may be exported #to NFS clients.
> See exports(5). # # Example for NFSv2 and NFSv3: # /srv/homes
> hostname1(rw,sync,no_subtree_check) 
> hostname2(ro,sync,no_subtree_check) # # Example for NFSv4: #
> /srv/nfs4
> gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) #
> /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check) #
> 
> root at master:/data# cat /etc/exports # /etc/exports: the access
> control list for filesystems which may be exported #to NFS clients.
> See exports(5). # # Example for NFSv2 and NFSv3: # /srv/homes
> hostname1(rw,sync,no_subtree_check) 
> hostname2(ro,sync,no_subtree_check) # # Example for NFSv4: #
> /srv/nfs4
> gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) #
> /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check) # /home
> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6
> node001(async,no_root_squash,no_subtree_check,rw) /data
> node001(async,no_root_squash,no_subtree_check,rw)
> 
> root at master:/data# cat /etc/exports # /etc/exports: the access
> control list for filesystems which may be exported #to NFS clients.
> See exports(5). # # Example for NFSv2 and NFSv3: # /srv/homes
> hostname1(rw,sync,no_subtree_check) 
> hostname2(ro,sync,no_subtree_check) # # Example for NFSv4: #
> /srv/nfs4
> gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) #
> /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check) # /home
> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6
> node001(async,no_root_squash,no_subtree_check,rw) /data
> node001(async,no_root_squash,no_subtree_check,rw) /home
> node002(async,no_root_squash,no_subtree_check,rw) /opt/sge6
> node002(async,no_root_squash,no_subtree_check,rw) /data
> node002(async,no_root_squash,no_subtree_check,rw)
> 
> On Wed, Jan 18, 2012 at 23:45, Paul Koerbitz
> <paul.koerbitz at gmail.com <mailto:paul.koerbitz at gmail.com>> wrote:
> 
> Hi Justin,
> 
> ok great. I have something running right now that I don't want to 
> interrupt, but I might be able to take a stab at it tomorrow and 
> will report back then.
> 
> cheers Paul
> 
> On Wed, Jan 18, 2012 at 23:17, Justin Riley <jtriley at mit.edu 
> <mailto:jtriley at mit.edu>> wrote:
> 
> Hi Paul,
> 
> No problem at all and thanks for the kind words. From my limited 
> testing I believe this is fixed in the latest github code which
> will be included in tomorrow's patch release:
> 
> http://tinyurl.com/8axmckc
> 
> If you could test the latest github code and report back whether
> it fixes the issue for you or not that'd be very helpful.
> 
> ~Justin
> 
> On 01/18/2012 03:44 PM, Paul Koerbitz wrote:
>> Hi Justin,
> 
>> thanks for the fast response and the great work. I thought about 
>> taking a crack at a fix myself, but Im not familiar with the 
>> codebase and don't have very little time right now.
> 
>> thanks Paul
> 
>> On Wed, Jan 18, 2012 at 21:33, Justin Riley <jtriley at mit.edu
> <mailto:jtriley at mit.edu>
>> <mailto:jtriley at mit.edu <mailto:jtriley at mit.edu>>> wrote:
> 
>> Hi Paul,
> 
>> I just tested for myself and I can confirm that /etc/exports is 
>> indeed being clobbered when running the 'addnode' command. I'm 
>> working on a patch release to fix this and other minor things. 
>> Should be out tomorrow.
> 
>> Thanks for reporting!
> 
>> ~Justin
> 
>> On 01/18/2012 02:08 PM, Paul Koerbitz wrote:
>>> Dear starcluster team,
> 
>>> I tripped over what might be an error with the NFS setup when 
>>> adding new nodes to a cluster.
> 
>>> I set up my cluster with initially one root node only and then 
>>> first added one node and subsequently 4 more nodes. I noticed 
>>> that my ebsvolume wasn't getting mounted correctly on the
>>> nodes, calling 'df' reported 'stale filehandle' for /home
>>> /opt/sge6 and /data
> 
>>> My impression is that as nodes get added, the /etc/exports
>>> file which is responsible for allowing NFS access gets
>>> overwritten. Therefore only the last added node can access the
>>> shared file systems.
> 
>>> Here is how I resloved the issue. First I unmounted all the 
>>> volumes:
> 
>>> root at node001:~# umount -f /data
> 
>>> At this point remounting doesn't work:
> 
>>> root at node001:~# mount -t nfs master:/data /data
> 
>>> mount.nfs: access denied by server while mounting master:/data
> 
> 
>>> I then edited /etc/exports on the master node. Here only the 
>>> last node was listed:
> 
>>> /home node005(async,no_root_squash,no_subtree_check,rw) 
>>> /opt/sge6 node005(async,no_root_squash,no_subtree_check,rw) 
>>> /data node005(async,no_root_squash,no_subtree_check,rw)
> 
>>> I changed this to /home 
>>> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>>> node001(async,no_root_squash,no_subtree_check,rw) /data 
>>> node001(async,no_root_squash,no_subtree_check,rw) /home 
>>> node002(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>>> node002(async,no_root_squash,no_subtree_check,rw) /data 
>>> node002(async,no_root_squash,no_subtree_check,rw) /home 
>>> node003(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>>> node003(async,no_root_squash,no_subtree_check,rw) /data 
>>> node003(async,no_root_squash,no_subtree_check,rw) /home 
>>> node004(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>>> node004(async,no_root_squash,no_subtree_check,rw) /data 
>>> node004(async,no_root_squash,no_subtree_check,rw) /home 
>>> node005(async,no_root_squash,no_subtree_check,rw) /opt/sge6 
>>> node005(async,no_root_squash,no_subtree_check,rw) /data 
>>> node005(async,no_root_squash,no_subtree_check,rw)
> 
>>> then restart the nfs-server:
> 
>>> $ /etc/init.d/nfs-kernel-server restart
> 
>>> After that running 'df' on each node showed the nfs now
>>> working correctly.
> 
>>> kind regards Paul
> 
> 
>>> This body part will be downloaded on demand.
> 
> 
> 
> 
> _______________________________________________ StarCluster mailing
> list StarCluster at mit.edu <mailto:StarCluster at mit.edu> 
> http://mailman.mit.edu/mailman/listinfo/starcluster
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8YN/MACgkQ4llAkMfDcrmhzQCeLUJ2brw3x/9UN7G08mPsmDI4
ku0An0RYgyCPmlXECVa2bneP5a502a42
=bwFL
-----END PGP SIGNATURE-----


More information about the StarCluster mailing list