[Dspace-general] Consistency - slight authority control?

MacKenzie Smith kenzie at MIT.EDU
Sun Mar 26 17:28:00 EST 2006


Hi Suzanne,

I hope the lack of response to your question is due to people's busy-ness 
and not just lack of interest...
variations of this question have come up so often that I think you're in 
good, if quiet, company.

>I am interested in finding out how DSpace sites are working on consistency
>of entry for things like names and keywords/subjects. Without a full blown
>authority control module, we are trying to figure out a way to have
>depositors record names consistently. Has anyone figured out a way or
>workflow that would help?

As you said, there's a bit of support for controlled vocabularies in DSpace 
now  http://dspace.org/technology/system-docs/submission.html which is ok 
when the number of terms is reasonably small and fits in a drop-down box on 
the Web submission screen. That feature is normally used for subjects and 
other fields with small vocabularies, but doesn't scale to large 
vocabularies like LCSH or AAT.

>Currently the standard input form accepts the last name in one box and
>first name and any other parts in another box. This will make sure we have
>the forms of the name in the right order. Our next goal would be to avoid
>this:
>Doe, J.A.
>Doe, Jane A.
>Doe, J Ann
>Doe, Jane Ann
>
>- especially when Jane Ann Doe is the person submitting the material to
>DSpace!

I know this isn't your particular problem, but I'd like to point out that 
there may be two competing needs, which are ably demonstrated in Google 
Scholar since it combines metadata from multiple scholarly sources, like 
journals, that use different conventions for personal names in their 
publications.

-- consistency of name representation is good to help cluster or co-locate 
a lot of items by the same person, but
-- using a form of the name that is different from the one used in the 
published work can make it harder for a user to find an item, if that's the 
only form of the name they know.

In practice, the form of the author's name that is supplied to DSpace is 
usually the one that was used in the publication, in the convention of the 
particular publisher. That's very convenient for people searching for a 
copy of that article from a citation they've found somewhere. It's less 
good for finding everything by that author in the repository.

In this situation what we *really* need is the ability to have multiple 
representations of the author's name, including a standardized one for 
clustering and all the variants that have appeared in publications... which 
is pretty complicated to implement of course.

That said, there are many situations where DSpace isn't dealing with formal 
publications and it's more desirable to standardize the form of the name so 
search results appear together. The two approaches that have been discussed 
before are

-- change the submission workflow code to check for the author's name in 
DSpace's e-person database table, and force the submitter to select a 
registered name. That works as long as all the potential authors are 
pre-registered and can be differentiated (i.e. there might still be 
multiple John Smiths so the e-person records for each of them have to 
contain some value that clearly differentiates them, like a middle name, 
birth date, etc.)

-- change the submission workflow code to check for the name in a national 
authority file, e.g. using OCLC's name authority Web Service. We tested 
that at MIT and it worked great as long as the author was in a national 
authority file... very often not the case, e.g. a student who has only 
published a thesis. A combination approach of checking he national 
authority file followed by a local, institutional authority file (say in 
LDAP) could be devised and would cover most cases.

The first approach would be much, much easier of course, so that's probably 
the place to start if you can afford to register all your institution's 
authors in DSpace.
The programming changes to check for the author would be pretty minor 
except for thinking through how to handle cases where there's no match, 
more than one matches, etc.

MacKenzie


MacKenzie Smith
Associate Director for Technology
MIT Libraries
Building E25-131d
77 Massachusetts Avenue
Cambridge, MA  02139
(617)253-8184
kenzie at mit.edu  




More information about the Dspace-general mailing list