[Dspace-general] Google search results bypass metadata records

MacKenzie Smith kenzie at MIT.EDU
Fri Jun 22 17:12:43 EDT 2007


Hi Christophe,
> Would you support the idea that if the user "messes up" the end of the 
> bitstream URL, he/she is redirected to the metadata display ?
I'm not sure what you mean... if the user gets a valid bitstream URL 
from Google there's not much we can do to intervene. The user can 
already examine the bitstream URL to figure out the item record Handle, 
but it's hard to educate random users how to do that. Do you have a 
particular scenario in mind?
> My experience with sitemaps (in another application than DSpace) is 
> very positive.
> I do not remember if something like http://host/dspace/sitemap is 
> returning a sitemap to Google ? does it deals with the limitation of 
> 50 (or so) thousands URL/ 10 megabytes per sitemap file?
Sitemaps will help with reducing the strain on DSpace sites when Google 
harvests them. Rob Tansley submitted a patch to support Google sitemaps 
awhile ago that should be in the next release. The size of the DSpace 
repository should not be a problem at all... but I'm not sure how this 
would help with the general problem of navigating users from bitstreams 
back to item records...

The only ways to do that that I can think of are:
-- alter the bitstream to contain a back link (almost certainly 
unacceptable for a preservation archive)
-- prevent Google from harvesting the bitstreams at all (e.g. via the 
sitemap) which isn't going to make users very happy... most of the hits 
from Google where on keywords from the full-text content files.

But if you have a different problem in mind, or some idea you want to 
try, I'm all ears!

MacKenzie




More information about the Dspace-general mailing list