[Dspace-general] Consistency - slight authority control?
MacKenzie Smith
kenzie at MIT.EDU
Sun Mar 26 17:28:00 EST 2006
Hi Suzanne,
I hope the lack of response to your question is due to people's busy-ness
and not just lack of interest...
variations of this question have come up so often that I think you're in
good, if quiet, company.
>I am interested in finding out how DSpace sites are working on consistency
>of entry for things like names and keywords/subjects. Without a full blown
>authority control module, we are trying to figure out a way to have
>depositors record names consistently. Has anyone figured out a way or
>workflow that would help?
As you said, there's a bit of support for controlled vocabularies in DSpace
now http://dspace.org/technology/system-docs/submission.html which is ok
when the number of terms is reasonably small and fits in a drop-down box on
the Web submission screen. That feature is normally used for subjects and
other fields with small vocabularies, but doesn't scale to large
vocabularies like LCSH or AAT.
>Currently the standard input form accepts the last name in one box and
>first name and any other parts in another box. This will make sure we have
>the forms of the name in the right order. Our next goal would be to avoid
>this:
>Doe, J.A.
>Doe, Jane A.
>Doe, J Ann
>Doe, Jane Ann
>
>- especially when Jane Ann Doe is the person submitting the material to
>DSpace!
I know this isn't your particular problem, but I'd like to point out that
there may be two competing needs, which are ably demonstrated in Google
Scholar since it combines metadata from multiple scholarly sources, like
journals, that use different conventions for personal names in their
publications.
-- consistency of name representation is good to help cluster or co-locate
a lot of items by the same person, but
-- using a form of the name that is different from the one used in the
published work can make it harder for a user to find an item, if that's the
only form of the name they know.
In practice, the form of the author's name that is supplied to DSpace is
usually the one that was used in the publication, in the convention of the
particular publisher. That's very convenient for people searching for a
copy of that article from a citation they've found somewhere. It's less
good for finding everything by that author in the repository.
In this situation what we *really* need is the ability to have multiple
representations of the author's name, including a standardized one for
clustering and all the variants that have appeared in publications... which
is pretty complicated to implement of course.
That said, there are many situations where DSpace isn't dealing with formal
publications and it's more desirable to standardize the form of the name so
search results appear together. The two approaches that have been discussed
before are
-- change the submission workflow code to check for the author's name in
DSpace's e-person database table, and force the submitter to select a
registered name. That works as long as all the potential authors are
pre-registered and can be differentiated (i.e. there might still be
multiple John Smiths so the e-person records for each of them have to
contain some value that clearly differentiates them, like a middle name,
birth date, etc.)
-- change the submission workflow code to check for the name in a national
authority file, e.g. using OCLC's name authority Web Service. We tested
that at MIT and it worked great as long as the author was in a national
authority file... very often not the case, e.g. a student who has only
published a thesis. A combination approach of checking he national
authority file followed by a local, institutional authority file (say in
LDAP) could be devised and would cover most cases.
The first approach would be much, much easier of course, so that's probably
the place to start if you can afford to register all your institution's
authors in DSpace.
The programming changes to check for the author would be pretty minor
except for thinking through how to handle cases where there's no match,
more than one matches, etc.
MacKenzie
MacKenzie Smith
Associate Director for Technology
MIT Libraries
Building E25-131d
77 Massachusetts Avenue
Cambridge, MA 02139
(617)253-8184
kenzie at mit.edu
More information about the Dspace-general
mailing list