[Macpartners] Filesystem virtualization for AFP

Fri Sep 29 13:02:50 EDT 2006

mark pearrow <mpearrow at csail.mit.edu> wrote:

> I think Douglas implied the problem in his last message: it's painful  
> to grow AFP volumes or to transparently migrate a volume to a new  
> location. AFP inherently requires the end user to know the physical  
> host that a volume is attached to (as does CIFS, although as Douglas  
> mentioned DFS tries to overcome this problem)

Yes, precisely so.  Plus all the other issues that you mentioned.

I replied to Albert's query for more details with a long email saying
similar things to what you wrote, but I did so from a different address
from that which I'm subscribed to macpartners at.  Consequently, the
message seems to have been quietly dropped on the floor.  I'll reinclude
it here, for those who care about such thoughts:

   Albert Willis <awillis at MIT.EDU> wrote:

   > 
   > On Sep 28, 2006, at 2:53 PM, Douglas Alan wrote:
   > 
   >     Does anyone know if there is a filesystem virtualization solution for
   >    
   >     Apple's AFP filesharing protocol?
   > 
   > What problem are you trying to solve? That would help us when trying to
   > suggest solutions.

   Flexible, expandable, shared storage for a growing research lab.

   Here's the complete primer on my concerns, in the unlikely event that
   you really want to know all the details:

	I also have some questions about these imminent fileshares, and how to
	best organize them.  I'm a Unix expert, but I'm a Mac expert only to
	the extent that OS X has Unix hidden underneath its skin, so I'm
	looking for some guidance on how to best handle things in this
	not-quite-traditional-Unix world.

	One of the biggest worries I have with shared filesystems is the issue
	of having inflexible volumes.  What always eventually happens is that
	a volume fills up, or a server gets overloaded, and then you need to
	add more disks or servers, and relocate directories from one volume to
	another, or maybe even to a new server.  If the paths to directories
	change during this relocation, it typically breaks scripts and other
	software that has already been configured to look for things in
	specific places.  The users also have to be notified and retrained to
	look for the stuff in the new location.  And even if you end up not
	having to relocate directories, you often have to resort to putting
	new directories on a new volume, while old directories remain on an
	old volume, and then people have to look in multiple places to find
	things.  It's much better if all of this unpleasantness can be avoided.

	One solution to this sort of problem has lately been referred to as
	"filesystem virtualization".  AFS invented the idea long, long ago by
	presenting all the AFS files in the entire world as one huge AFS
	volume.  With AFS sometimes the sysadmin has to move stuff around
	between disks or servers, and then change some entries in some config
	files somewhere, but all of this happens under the covers.  To the
	end-user the rearrangement of the data is all invisible.

	I hear that Microsoft has something similar to AFS called DFS that is
	layered on top of CIFS, but I know next to nothing about either CIFS
	or DFS.  (I believe that DFS usually spans a department or enterprise,
	rather than the entire world, but the idea is similar.) I also hear
	that there are now NFS filesystem virtualization servers that
	implement something similar for NFS too, but I haven't ever used such
	a server.

	Long before DFS or such NFS virtualization servers existed, though, I
	came up with a similar idea (inspired by AFS), and implemented it for
	NFS using the NFS automounter and symbolic links.

	I'll describe here how I have implemented filesystem virtualization
	using these tools, just in case you are interested:

	     For instance, at the MIT Media Lab, where I was a sysadmin for a
	     number of years, I set up a filesystem called "/mas" (it stood
	     for "Media Arts & Science").  Under /mas, there was /mas/disks.
	     Every disk drive on every NFS server was automounted under
	     /mas/disks with names like /mas/disks/mc0, /mas/disks/mc1,
	     /mas/disks/vlw0, /mas/disks/vlw1, etc.  "vlw1", for instance,
	     referred to a disk that was owned by the Visible Language
	     Workshop, which was a research group in the Media Lab.

	     People were encouraged, however, not to use these disk-oriented
	     path.  Instead, they were encouraged to use paths like

		/mas/man
		/mas/doc
		/mas/bin
		/mas/bin/arch.sun4
		/mas/vlw/doc
		/mas/vlw/bin
		/mas/vlw/man

	     etc.

	     These paths were typically implemented using symbolic links into
	     the automounted disk-oriented namespace.  /mas/vlw, for instance,
	     was a symlink to /mas/disks/vlw0, and /mas/vlw/doc (a.k.a
	     /mas/disks/vlw0/doc), which perhaps wouldn't fit on the "vlw0"
	     disk, might then be a symlink to /mas/disks/vlw1/doc.

	     We also had a symlink farm in /mas/u ("u" was for "user", like OS
	     X's "/User" or Sun's "/home") that linked to everyone's homedir.
	     So, for instance, I could always get to my home via /mas/u/nessus
	     from any computer.  ("nessus" was my username.)

	     Things at times could be a bit confusing with an excess of
	     symlinks.  (So I can see a use for NFS virtualization servers,
	     which would basically hide the symlinks from the end-user.) On
	     the other hand, we never ended up with a situation where we would
	     have to tell people, "Hey, we're sorry, but /vlw is full, so
	     we're going to have to move /vlw/bin to /vlw2/bin.  Please change
	     all your files and programs accordingly."

	The question I have is: Is there a way of achieving filesystem
	virtualization with AFP?  I.e., is there an AFP equivalent to M$'s
	DFS?  Or is it possible to automount AFP volumes and combine them with
	symlinks as I described above?  Or perhaps there are AFP
	virtualization servers?

	Ideally, I think we should have a single large AFP volume called
	"IIC".  (I'm guessing that OS X will want to mount this under
	/Volumes/IIC).  We could then have

	   /Volumes/IIC/Users

	with a directory under there for each user.  We could also have

	   /Volumes/IIC/IIC Admin

	as a shared folder for the IIC administrative folks,

	   /Volumes/IIC/Applications

	for programs that can be run from an AFP share,

	   /Volumes/IIC/Astromed

	as a shared folder for the IIC Astromed project, etc.

	My worry, as described above, however, is that eventually we'll run
	out of space in the "IIC" volume, that this volume won't be
	expandable, and then we'll need another volume, "IIC2" or whatever,
	and then eventually "IIC3", etc.  And then we'll end up with some
	things under "IIC", some under "IIC2", and some under "IIC3".  This
	will make it difficult for the end-user to figure out and remember
	which volume contains what.  Consequently, if at all possible, I'd
	like to figure out how to implement filesystem virtualization for our
	AFP fileserver.

	If filesystem virtualization is not possible with AFP, then perhaps we
	should take a much more fine-grained approach with our AFP volumes and
	make a separate volume for each project and user.  With such a
	"mini-volume" approach, if an AFP volume outgrows its space, it can be
	more easily relocated to another RAID volume or server.

	But then there's the question, especially if we have lots of
	mini-volumes, of how users can find and mount the AFP mini-volumes.
	If the fileserver were on our LAN, the mini-volumes would be browsable
	in the Finder via the "Network" icon.  Can such browsability be made
	to work through the router?

	Also, I'm unclear as to how authentication works with AFP.  With NFS
	(i.e., unkerborized and prior to v4), there isn't any real
	authentication, so everything works smoothly, if a bit insecurely.
	With the more robust security that AFP presumably offers (does it?),
	do we have to worry about kerberos tickets expiring and the like?
	Will people have to manually mount all volumes via the Finder after
	they log in?  Or can they somehow specify a list of volumes to be
	automatically mounted?  Will people have to log out and log back in
	periodically to remain authenticated?  Or will a dialog box pop-up
	periodically reprompting them for their password?  If so, how does one
	typically handle the issue of long-running programs that need to
	proceed unattended, or jobs that run from cron?

	One final concern of mine is how do fileserver-hosted home directories
	typically work?  We won't want fileserver-hosted homedirs right away,
	but we probably also want to plan ahead for them.  How and where do
	they usually get mounted?  Do they just appear under "/Users"?  Or do
	they appear somewhere else?  What about if you need to access someone
	else's fileserver-hosted homedir?  How do you get to it and where does
	that appear in the filesystem?

	I don't think we want to conflate the per-user fileserver-hosted
	folders that we want to set up now with any future fileserver-hosted
	homedirs, so whatever naming scheme we decide to use, we should come
	up with something that is consistent with everyone ultimately having
	at least one of each.

|>oug