[Dspace-general] Google Scholar and OAI
MacKenzie Smith
kenzie at MIT.EDU
Wed Feb 2 19:33:03 EST 2005
Hello DSpace community,
I have been talking to Anurag at Google lately and pestering him about the
Google Scholar issues/problems we came have discussed.
One thing I learned is that they *do* intend to include DSpace item-level
metadata along with the full text of the documents, but they're having a
hard time because of all those lovely customizations you've all made to the
UI which make it impossible for them to predict where to find the metadata
they want when they crawl the site (they only want certain fields like the
author and title, not everything). And the fact that not everyone uses the
same persistent identifier scheme is also difficult for them... it's hard
for them to identify which string is the persistent id they should grab.
SO, they are interested in evaluating using OAI for this purpose (hooray!)
but alas, many of you have changed the default OAI baseurl so they can't
find your OAI server. I know that's true, because for the pilot project we
did I found 4 different baseurl patterns for 17 DSpace sites... I suggested
using a registry like the DSpace wiki or OCLC's for this, but they claim
this will not scale to the level of the gazillions of repositories that
they hope will exist in the future. They want an approach like robots.txt
-- predictable place, same for every repository. I think that sounds
reasonable... don't you?
So what do you say? Can we make sure our OAI servers are up and running
correctly, and at a canonical location (like maybe the one DSpace uses by
default)? Is getting indexed by Google Scholar worth it to agree on such a
convention?
Thanks, and I look forward to your thoughts and reactions.
MacKenzie
PS -- they want bitstream level Handles too (or at least predictable,
stable URLs) so that they can start to create citeseer-like functionality
across documents.
They would do the citation match ups using the URIs to the bitstreams, so
they need to be as stable over time as a Handle or DOI... That's the first
good reason I've ever heard for assigning a persistent ID to a bitstream,
but it's a pretty good one.
MacKenzie Smith
Associate Director for Technology
MIT Libraries
Building E25-131d
77 Massachusetts Avenue
Cambridge, MA 02139
(617)253-8184
kenzie at mit.edu
More information about the Dspace-general
mailing list