[Dspace-general] Re: Google's Scholarly Search Service and Institutional OA Self-Archiving

Stevan Harnad harnad at ecs.soton.ac.uk
Mon Nov 29 08:11:49 EST 2004


Google Scholar http://scholar.google.com
currently has 3,630 items for Dspace and
16,100 for Eprints. I expect both are undercounts. Eprints
certainly is, otherwise the figure would be at the very least
55,000 (for the Eprints subset harvested by celestial)
http://celestial.eprints.org/cgi-bin/eprints.org/graph 
or at the very-very-very least 31,688:
http://www.eprints.org/

I hope Google Scholar will cover these two sets of scholarly 
full-text sites more fully. They are likely to provide the
richest source of scholarly full-texts. Regular Google, for
example, already carries 210,000 items from Eprints, so they
are in there! Just a matter of porting them to Google Scholar!

The fact that the item is in Eprints or Dspace should be a criterion in
scholar.google's identification rule. So should the fact that
it comes form an OAI-compliant site.

Stevan Harnad

 On Mon, 29 Nov 2004, Peter Suber wrote:

> [Forwarding from the DSpace-general list.  --Peter.]
> 
> 
> Hi all,
> 
> I wanted to mention that the new Google Scholar search 
> (http://scholar.google.com) is including items from
> DSpace repositories in the results, as long as they're open for harvesting 
> the full-text. I did notice that some
> institutions running DSpace that should be there aren't yet, so I've asked 
> Google why they're missing.
> 
> It can be a little tricky to figure out if you're institution is getting 
> included or not -- search some known items
> from your repository and plow through all the results, and be sure to check 
> all the versions since your copy
> might not be one of the first listed. If you're there, great, and if you're 
> not (and want to be) then first make
> sure your repository's web server isn't blocking crawlers, and then write 
> to me or them directly
> (scholar-support at google.com) to make sure they crawl your site.
> 
> They also wanted me to mention that if you have limited access material 
> that you would like to get indexed
> by Google but not cached by them for display, they're very interested in 
> working with you. For example, at
> MIT we have some book titles from the MIT Press in our DSpace repository 
> which are only available for free
> to the MIT community. Google proposes to index them, but not cache them, so 
> that when a searcher finds
> one of them in a result set in google.com they're returned to DSpace to 
> view the item and can get to the
> Press's online ordering system from there. More traffic for the book, more 
> money for the Press. Let me
> know if you're interested in this and I'll put you in touch with the Google 
> folks. Remember: if your DSpace
> content is freely available to the public then Google and the other web 
> search engines should *already* be
> harvesting it so you don't need to do anything...
> 
> MacKenzie
> 
> 
> MacKenzie Smith
> Associate Director for Technology
> MIT Libraries
> Building E25-131d
> 77 Massachusetts Avenue
> Cambridge, MA  02139
> (617)253-8184
> kenzie at mit.edu




More information about the Dspace-general mailing list