[Dspace-general] RE: [Dspace-tech] Google Scholar and OAI

Tansley, Robert robert.tansley at hp.com
Thu Feb 3 13:13:18 EST 2005


A related question -- are Google assuming that every DSpace instance
contains scholarly literature?  This certainly isn't the case even now,
and is set to become less likely given the divergent uses of DSpace
we're seeing, e.g. Kansas state government's www.kspace.org.  

And will/does/should it differentiate between 'production' DSpace
instances and the numerous 'test' instances, which may or may not stay
around and contain 'real' content?

I assume right now, to identify a DSpace and what's in it, they're using
some sort of heuristic; but due to people's customisations, diverging
uses of DSpace and a rapidly-evolving platform, that approach doesn't
feel like it'll last long.  I think any mechanism we come up with, such
as those Andy Powell suggested, should also take into account the above
issues.

 Robert Tansley / Digital Media Systems Programme / HP Labs
  http://www.hpl.hp.com/personal/Robert_Tansley/


> -----Original Message-----
> From: dspace-tech-admin at lists.sourceforge.net 
> [mailto:dspace-tech-admin at lists.sourceforge.net] On Behalf Of 
> MacKenzie Smith
> Sent: 02 February 2005 19:33
> To: dspace-tech at lists.sourceforge.net; dspace-general at mit.edu
> Cc: Peter Brantley
> Subject: [Dspace-tech] Google Scholar and OAI
> 
> Hello DSpace community,
> 
> I have been talking to Anurag at Google lately and pestering 
> him about the 
> Google Scholar issues/problems we came have discussed.
> One thing I learned is that they *do* intend to include 
> DSpace item-level 
> metadata along with the full text of the documents, but 
> they're having a 
> hard time because of all those lovely customizations you've 
> all made to the 
> UI which make it impossible for them to predict where to find 
> the metadata 
> they want when they crawl the site (they only want certain 
> fields like the 
> author and title, not everything). And the fact that not 
> everyone uses the 
> same persistent identifier scheme is also difficult for 
> them... it's hard 
> for them to identify which string is the persistent id they 
> should grab.
> 
> SO, they are interested in evaluating using OAI for this 
> purpose (hooray!) 
> but alas, many of you have changed the default OAI baseurl so 
> they can't 
> find your OAI server. I know that's true, because for the 
> pilot project we 
> did I found 4 different baseurl patterns for 17 DSpace 
> sites... I suggested 
> using a registry like the DSpace wiki or OCLC's for this, but 
> they claim 
> this will not scale to the level of the gazillions of 
> repositories that 
> they hope will exist in the future. They want an approach 
> like robots.txt 
> -- predictable place, same for every repository. I think that sounds 
> reasonable... don't you?
> 
> So what do you say? Can we make sure our OAI servers are up 
> and running 
> correctly, and at a canonical location (like maybe the one 
> DSpace uses by 
> default)? Is getting indexed by Google Scholar worth it to 
> agree on such a 
> convention?
> 
> Thanks, and I look forward to your thoughts and reactions.
> 
> MacKenzie
> 
> PS -- they want bitstream level Handles too (or at least predictable, 
> stable URLs) so that they can start to create citeseer-like 
> functionality 
> across documents.
> They would do the citation match ups using the URIs to the 
> bitstreams, so 
> they need to be as stable over time as a Handle or DOI... 
> That's the first 
> good reason I've ever heard for assigning a persistent ID to 
> a bitstream, 
> but it's a pretty good one.
> 
> 
> MacKenzie Smith
> Associate Director for Technology
> MIT Libraries
> Building E25-131d
> 77 Massachusetts Avenue
> Cambridge, MA  02139
> (617)253-8184
> kenzie at mit.edu 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive 
> Reporting
> Tool for open source databases. Create drag-&-drop reports. Save time
> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> 



More information about the Dspace-general mailing list