[Dspace-general] FW: Google harvesting using OAI-PMH
Stuart Lewis
sdl at aber.ac.uk
Mon Apr 28 02:10:55 EDT 2008
Hi Leonie,
> Would anyone like to comment from a DSpace perspective, (see below) regarding
> current practices. There seems to be a lot of conflicting information around.
Agreed! From what I've seen there has been a lot of hype about a service
used by only 200 people which Google is discontinuing.
> A few months ago there was some discussion regarding Google harvesting
> information from DigiTool so here is a short update.
>
> The official Google statement stated that Google supports harvesting using
> OAI-PMH and as DigiTool can serve as an OAI-PMH provider we assumed that
> harvesting by Google is possible. After a few attempts, and a few discussions
> with Google (we should thank CJH for doing the effort) we got a clear message
> that OAI-PMH harvesting is no longer supported and customers should use the
> standard map site required by Google (personally, I¹m not sure if OAI-PMH was
> ever supported).
*As far as we know*, Google did not use OAI-PMH to harvest items as for
example OAISter do. They only used OAI-PMH as a way of discovering new web
pages in a web site, which they then go and index in a traditional way. So
sites could provide their OAI-PMH feed to Google to help ensure Google had
complete coverage of a site.
Because all Google really wants is the URLs which they can then index,
OAI-PMH is somewhat heavy-weight for this purpose, giving them a lot more
information than they really want. This can be seen by their participation
in the development of sitemaps (http://sitemaps.org/) which is much more
lightweight.
> We use the standard site map and exclude the Browse screens and Suggest a
> title, do most other sites do the same?
Yes - that is common good practise, as there is no point in Google spidering
your browse screens if they can get the same information (a list of all the
items / collections / communities) in a more efficient way.
One of the less publicised features of DSpace version 1.5 is the inclusion
of support for sitemaps. [dspace]/bin/generate-sitemaps will generate your
sitemaps, which are then exposed at http://dspace.example.com/dspace/sitemap
You'll need to register with Google Webmaster Tools (
http://www.google.com/webmasters/tools/) in order to be able to inform
Google where the sitemap is located.
Thanks,
Stuart
_________________________________________________________________
Gwasanaethau Gwybodaeth Information Services
Prifysgol Aberystwyth Aberystwyth University
E-bost / E-mail: Stuart.Lewis at aber.ac.uk
Ffon / Tel: (01970) 622860
_________________________________________________________________
More information about the Dspace-general
mailing list