[Dspace-general] [Dspace-tech] To Handle or not?

Fri Dec 15 16:19:17 EST 2006

Hi Sean,

I'm interested in identifier systems because it's something people do 
everyday without thought.  For example, if I had the title or isbn (URN) 
of a book and went to a library, I would search the stacks using a 
binary type search and the book's catalog number (URL).  I find the 
book, make a note of where I found it and leave.  Now if I go back a 
year later, would I expect the library to store the book in the exact 
same location?  If that book isn't in the same location, what would I 
do...I'd run another search and go through the process again.  If the 
book has moved collections, or been dropped entirely I would cast a 
wider search net.

Now, in the digital world, why wouldn't we handle this in the same 
manner?  We just seem to be over complicating a situation that exists in 
both worlds.  Maybe it can be solved easier in the digital world, but I 
don't see that happening...  At lease not yet.

More comments in line below:

On 12/15/2006 02:12 PM, Sean Reilly wrote:
[snip]

>> - The idea of Handle is to use a URN, however, the URN RFC (2141) has
>> not gained traction since its creation in May 1997 and browsers don't
>> support URN for the most part.
> 
> 
> I think RFC 3986 (URI re-specification) attempts to define any URI  that 
> is used as a name (as opposed to a location) as a URN.  I can't  say if 
> the URN crowd agrees with that, but that would mean that hdl:  URI 
> schemes would be classified as a URN even if it wasn't under the  urn: 
> namespace.  I agree that URN hasn't really caught on.  I only  know of 
> one native URN resolver and it has not been publicly released  to my 
> knowledge.

I haven't really looked at RFC 3986, but will give it one now.

[snip]

>> - The Handle server itself acts as a more complicated DNS server.  Why
>> add an extra layer over a system that works well.  When will we add a
>> system on top of Handle?
> 
> 
> The Handle service is a separate parallel system to DNS that was  
> designed with different intentions, restrictions, and capabilities.   It 
> was initially designed to offer a (relatively) flat namespace,  more 
> flexible data types, extreme scalability, security, and the  ability to 
> administer handles on an individual basis (as opposed to a  sysadmin 
> updating a zone file and restarting the server).
> 
> The handle system is designed to identify fine-grained digital  objects 
> and has a modern architecture appropriate to that usage.

What about communities, collections, etc?  I know that handle is 
supposed to match a handle to a digital object/item.  However, can that 
item be a Dspace community?  Can a Handle server take this request? 
I've run a few queries and haven't been able to get at anything other 
than items.  Everything else comes back with a 404 Handle error.

I would imagine that based on current functionality of other resolver 
systems, that stripping a '/identifier' would take me up one level or to 
the top, yet Handle quietly fails...Handle gets the error, not the 
institution.

>> - DNS maps easy to remember names with hard to remember numbers.   Handle
>> uses numbers to identify unique institutions.  If people have a hard
>> time remembering numbers, why would I choose something like
>> http://hdl.handle.net/1721.1/34898 for my system?  Or when will Handle
>> have a DNSish syntax like http://hdl.handle.net/mit.dspace/34898 or
>> something similar?  If it already exists why not just use
>> http://dspace.mit.edu/34898?
> 
> 
> Part of the purpose of using numbers is to avoid embedding semantics  in 
> the identifier itself, such as the owner of an object or name of a  
> collection.  This is because owners and administration change (and  
> change names).  It's not likely that MIT will change their name  anytime 
> soon, but why would you put the name of the repository  software 
> (mit.dspace/...) in every document identifier?  If that  digital object 
> were moved to another repository system or to another  hosting 
> organization the mit.dspace part would be a bit misleading.
> 
> My argument for using numbers instead of more readable names is that  
> people don't need to remember them - computers do.  You are free to  use 
> readable names in the local part (after the slash) of handle  
> identifiers, but issuing readable handle prefixes produces more  
> problems (trademark, squatting, etc) than it solves.

I agree that numeric identifiers are better, however, people need to 
remember URIs just as much as computers.  While I do have portable 
computing devices, I don't carry them everywhere.  If I'm somewhere and 
need a source that isn't in my PDA, laptop or written down, I can 
usually remember it because of the URI.  I think that using numbers, 
while good for machines, is going to hurt the real purpose of the 
Handle's mission...linking users to the digital objects they need.

BTW, I noticed many of CNRI documents have easy to remember handles: 
http://hdl.handle.net/cnri.dlib/tn95-01

Couldn't help myself... ;P

>> - When you go to a handle URL that doesn't exist (possibly moved or
>> removed), your system doesn't know.  You get Handle's 404 page, not  the
>> institution that hosts the data, so how are you informed of these  
>> requests?
> 
> 
> We have a new mechanism (not yet fully documented/publicized) that  
> allows namespace information to be associated with a handle prefix.   
> This info includes contact email address and other bits that can  direct 
> users of the handle proxy (http://hdl.handle.net) to the  person 
> responsible for the namespace of the identifier that failed to  
> resolve.  For an example, try <http://hdl.handle.net/200/ 
> nonexistenthandle> and check the "contact us" address which has been  
> changed for the 200 prefix.

This requires user interaction.  Most users don't submit emails like 
this.  This should be redirected to the institution so they can use the 
exiting Apache/Tomcat logs to find these errors.  We should not have to 
rely on a user telling us separately from their request that there is a 
bad link out there.  In many instances, I've written 404 error pages to 
give a best guess for the object they were looking for, or sent them to 
a search page to find it themselves.

The Handle method leaves a disconnect between the user looking for the 
item and the host who may have the item.

[snip]

Thanks and hope to hear more.
-Brad

-- 
Brad Teale                            Web Application Developer
Digital Library Development Lab       University of Minnesota Libraries
teale003 at umn.edu                      612-625-0473