[StarCluster] Possible NFS setup error when adding new nodes to a cluster?
Paul Koerbitz
paul.koerbitz at gmail.com
Wed Jan 18 14:08:04 EST 2012
Dear starcluster team,
I tripped over what might be an error with the NFS setup when adding new
nodes to a cluster.
I set up my cluster with initially one root node only and then first added
one node and subsequently 4 more nodes.
I noticed that my ebsvolume wasn't getting mounted correctly on the nodes,
calling 'df' reported 'stale filehandle' for
/home /opt/sge6 and /data
My impression is that as nodes get added, the /etc/exports file which is
responsible for allowing NFS access gets overwritten. Therefore only the
last added node can access the shared file systems.
Here is how I resloved the issue. First I unmounted all the volumes:
root at node001:~# umount -f /data
At this point remounting doesn't work:
root at node001:~# mount -t nfs master:/data /data
mount.nfs: access denied by server while mounting master:/data
I then edited /etc/exports on the master node. Here only the last node was
listed:
/home node005(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node005(async,no_root_squash,no_subtree_check,rw)
/data node005(async,no_root_squash,no_subtree_check,rw)
I changed this to
/home node001(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node001(async,no_root_squash,no_subtree_check,rw)
/data node001(async,no_root_squash,no_subtree_check,rw)
/home node002(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node002(async,no_root_squash,no_subtree_check,rw)
/data node002(async,no_root_squash,no_subtree_check,rw)
/home node003(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node003(async,no_root_squash,no_subtree_check,rw)
/data node003(async,no_root_squash,no_subtree_check,rw)
/home node004(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node004(async,no_root_squash,no_subtree_check,rw)
/data node004(async,no_root_squash,no_subtree_check,rw)
/home node005(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node005(async,no_root_squash,no_subtree_check,rw)
/data node005(async,no_root_squash,no_subtree_check,rw)
then restart the nfs-server:
$ /etc/init.d/nfs-kernel-server restart
After that running 'df' on each node showed the nfs now working correctly.
kind regards
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20120118/d065acc2/attachment.htm
More information about the StarCluster
mailing list