[StarCluster] Possible NFS setup error when adding new nodes to a cluster?

Paul Koerbitz paul.koerbitz at gmail.com
Wed Jan 18 14:08:04 EST 2012


Dear starcluster team,

I tripped over what might be an error with the NFS setup when adding new
nodes to a cluster.

I set up my cluster with initially one root node only and then first added
one node and subsequently 4 more nodes.
I noticed that my ebsvolume wasn't getting mounted correctly on the nodes,
calling 'df' reported 'stale filehandle' for
/home /opt/sge6 and /data

My impression is that as nodes get added, the /etc/exports file which is
responsible for allowing NFS access gets overwritten. Therefore only the
last added node can access the shared file systems.

Here is how I resloved the issue. First I unmounted all the volumes:

root at node001:~# umount -f /data

At this point remounting doesn't work:

root at node001:~# mount -t nfs master:/data /data

mount.nfs: access denied by server while mounting master:/data


I then edited /etc/exports on the master node. Here only the last node was
listed:

/home node005(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node005(async,no_root_squash,no_subtree_check,rw)
/data node005(async,no_root_squash,no_subtree_check,rw)

I changed this to
/home node001(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node001(async,no_root_squash,no_subtree_check,rw)
/data node001(async,no_root_squash,no_subtree_check,rw)
/home node002(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node002(async,no_root_squash,no_subtree_check,rw)
/data node002(async,no_root_squash,no_subtree_check,rw)
/home node003(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node003(async,no_root_squash,no_subtree_check,rw)
/data node003(async,no_root_squash,no_subtree_check,rw)
/home node004(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node004(async,no_root_squash,no_subtree_check,rw)
/data node004(async,no_root_squash,no_subtree_check,rw)
/home node005(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node005(async,no_root_squash,no_subtree_check,rw)
/data node005(async,no_root_squash,no_subtree_check,rw)

then restart the nfs-server:

$ /etc/init.d/nfs-kernel-server restart

After that running 'df' on each node showed the nfs now working correctly.

kind regards
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20120118/d065acc2/attachment.htm


More information about the StarCluster mailing list