[Dspace-general] FW: Google harvesting using OAI-PMH

Stuart Lewis sdl at aber.ac.uk
Mon Apr 28 02:10:55 EDT 2008


Hi Leonie,

> Would anyone like to comment from a DSpace perspective, (see below) regarding
> current practices. There seems to be a lot of conflicting information around.

Agreed! From what I've seen there has been a lot of hype about a service
used by only 200 people which Google is discontinuing.

> A few months ago there was some discussion regarding Google harvesting
> information from DigiTool ­ so here is a short update.
>  
> The official Google statement stated that Google supports harvesting using
> OAI-PMH and as DigiTool can serve as an OAI-PMH provider we assumed that
> harvesting by Google is possible. After a few attempts, and a few discussions
> with Google (we should thank CJH for doing the effort) we got a clear message
> that OAI-PMH harvesting is no longer supported and customers should use the
> standard map site required by Google (personally, I¹m not sure if OAI-PMH was
> ever supported).

*As far as we know*, Google did not use OAI-PMH to harvest items as for
example OAISter do. They only used OAI-PMH as a way of discovering new web
pages in a web site, which they then go and index in a traditional way. So
sites could provide their OAI-PMH feed to Google to help ensure Google had
complete coverage of a site.

Because all Google really wants is the URLs which they can then index,
OAI-PMH is somewhat heavy-weight for this purpose, giving them a lot more
information than they really want. This can be seen by their participation
in the development of sitemaps (http://sitemaps.org/) which is much more
lightweight.

> We use the standard site map and exclude the Browse screens and Suggest a
> title, do most other sites do the same?

Yes - that is common good practise, as there is no point in Google spidering
your browse screens if they can get the same information (a list of all the
items / collections / communities) in a more efficient way.

One of the less publicised features of DSpace version 1.5 is the inclusion
of support for sitemaps. [dspace]/bin/generate-sitemaps will generate your
sitemaps, which are then exposed at http://dspace.example.com/dspace/sitemap

You'll need to register with Google Webmaster Tools (
http://www.google.com/webmasters/tools/) in order to be able to inform
Google where the sitemap is located.

Thanks,


Stuart
_________________________________________________________________

Gwasanaethau Gwybodaeth                      Information Services
Prifysgol Aberystwyth                      Aberystwyth University

            E-bost / E-mail: Stuart.Lewis at aber.ac.uk
                 Ffon / Tel: (01970) 622860
_________________________________________________________________





More information about the Dspace-general mailing list