[Dspace-general] Google Scholar and OAI

MacKenzie Smith kenzie at MIT.EDU
Wed Feb 2 19:33:03 EST 2005


Hello DSpace community,

I have been talking to Anurag at Google lately and pestering him about the 
Google Scholar issues/problems we came have discussed.
One thing I learned is that they *do* intend to include DSpace item-level 
metadata along with the full text of the documents, but they're having a 
hard time because of all those lovely customizations you've all made to the 
UI which make it impossible for them to predict where to find the metadata 
they want when they crawl the site (they only want certain fields like the 
author and title, not everything). And the fact that not everyone uses the 
same persistent identifier scheme is also difficult for them... it's hard 
for them to identify which string is the persistent id they should grab.

SO, they are interested in evaluating using OAI for this purpose (hooray!) 
but alas, many of you have changed the default OAI baseurl so they can't 
find your OAI server. I know that's true, because for the pilot project we 
did I found 4 different baseurl patterns for 17 DSpace sites... I suggested 
using a registry like the DSpace wiki or OCLC's for this, but they claim 
this will not scale to the level of the gazillions of repositories that 
they hope will exist in the future. They want an approach like robots.txt 
-- predictable place, same for every repository. I think that sounds 
reasonable... don't you?

So what do you say? Can we make sure our OAI servers are up and running 
correctly, and at a canonical location (like maybe the one DSpace uses by 
default)? Is getting indexed by Google Scholar worth it to agree on such a 
convention?

Thanks, and I look forward to your thoughts and reactions.

MacKenzie

PS -- they want bitstream level Handles too (or at least predictable, 
stable URLs) so that they can start to create citeseer-like functionality 
across documents.
They would do the citation match ups using the URIs to the bitstreams, so 
they need to be as stable over time as a Handle or DOI... That's the first 
good reason I've ever heard for assigning a persistent ID to a bitstream, 
but it's a pretty good one.


MacKenzie Smith
Associate Director for Technology
MIT Libraries
Building E25-131d
77 Massachusetts Avenue
Cambridge, MA  02139
(617)253-8184
kenzie at mit.edu 



More information about the Dspace-general mailing list