[Dspace-general] Google search results bypass metadata records

Robert Tansley roberttansley at google.com
Mon Jun 25 11:59:19 EDT 2007


> Would you support the idea that if the user "messes up" the end of the
> bitstream URL, he/she is redirected to the metadata display ?

This was the original intention behind having the Handle in the URL
actually -- if the if the bitstream is deleted/obsolete but the item
is still around we can redirect to that.  Whether the issue is a
deleted/obsolete bitstream or an incorrect URL, it would probably be
better to put up an explanatory page that offers a link to the item
metadata page (perhaps with redirect after n seconds) rather than just
immediately redirect.  Just another one of those things that didn't
make version 1.0, and somehow no one has got round to fixing yet...

On Google searches and metadata pages:  Scholar.google.com search
results should get users to the item metadata page -- let us know if
you find any that don't!  For the more general google.com searches, it
will confuse users to show snippets of full text in the search result,
but for the user to click on the result and be presented with a
different page; this would not be consistent with how google.com or
Web searches in general currently work.  Blocking bitstreams from
indexing will just mean they won't get indexed at all, which doesn't
suit anyone!

The only real solution is to have backlinks in the PDFs etc -- the PDF
you disseminate doesn't necessarily need to be a bit-perfect copy of
your archival copy.  It wouldn't be too hard to build a Media Filter
that copies the PDF and adds a link to the top of the copy; you could
allow this PDF to be indexed rather than the archival copy.

Rob

On 21/06/07, Christophe Dupriez <christophe.dupriez at destin.be> wrote:
> Hi MacKenzie!
>
> Would you support the idea that if the user "messes up" the end of the
> bitstream URL, he/she is redirected to the metadata display ?
>
> My experience with sitemaps (in another application than DSpace) is very
> positive.
> I do not remember if something like http://host/dspace/sitemap is
> returning a sitemap to Google ? does it deals with the limitation of 50
> (or so) thousands URL/ 10 megabytes per sitemap file?
>
> Good night!
>
> Christophe
>
> MacKenzie Smith a écrit :
> > Hi Pat,
> >
> > [snip]
> >
> >> found it interesting (in the context of archival discussions about "exploding"
> >> the authority of the finding aid) that opening DSpace to Web 2.0 permits
> >> this broader, deracinated granular access.
> >>
> > Can I add that this wasn't made possible by fancy Web 2.0 enhancements
> > at all...
> > Google has always had the ability to walk a DSpace site right down to
> > the (unrestricted)
> > bitstreams and index whatsoever they chose... item metadata, bitstreams,
> > or both.
> >
> >> just that (since Google is DSpace-aware) the searcher be shown how to get the
> >> metadata if indeed it is wanted for the searcher's purposes.
> >>
> > Yes that would be perfect, but Google says this would require something
> > in the bitstream
> > that they crawl that would point the user back to the higher level
> > record... it's not
> > something they're set up to do otherwise.
> >
> > As Christophe says, ideally we could deal with this through http headers
> > or something
> > else that doesn't depend on the actual content being "context aware".
> > But no one's
> > figured out a good way to do that, and Google isn't that "DSpace-aware"...
> >
> > MacKenzie
> >
> > _______________________________________________
> > Dspace-general mailing list
> > Dspace-general at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/dspace-general
> >
> >
> >
> >
>
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general
>
>
>




More information about the Dspace-general mailing list