[Dspace-general] Google search results bypass metadata records
Christophe Dupriez
christophe.dupriez at destin.be
Thu Jun 21 20:26:21 EDT 2007
Hi Pat! (copy to Roger Costello, proposer of HTTP "Meta-Location"
header, that may help make this discussion fruitful)
I like very much your idea: for a given URL, there should be a standard
"derivation" to get its metadata / context.
In an HTML document, we could have a <LINK with rel=link type:
http://www.w3.org/TR/html401/types.html#type-links
But no "Metadata" type! May be "Index" could be used.
This document suggests to use REL="META"
http://vancouver-webpages.com/ml/draft-daviel-metadata-link-00.txt
There is also the PROFILE attribute of the <HEAD
http://www.w3.org/TR/html401/struct/global.html#h-7.4.4.3
Anyway, we mainly store PDF (and not HTML documents) and we usually
cannot change the content of the documents we receive...
I just checked and I see no HTTP verb or Header field that could carry
this "content independent" information
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Idealy, "derivation" toward metadata should be done through HTTP (which
is content independent).
May be one day the ACCEPT header could contain metadata/html (instead of
text/html) to indicate that metadata is requested and not the text
(content).
People in W3C seem to discuss this:
http://osdir.com/ml/org.w3c.tag/2003-02/msg00247.html
They propose a new HTTP header: Meta-Location
http://www.xfront.com/dist-reg/distributed-registry.html
So we will see in the future versions of HTTP...
For now, does anybody controls the relation with Google so they could
add to their DSpace awareness a "Metadata" link to the document hit
indication, linking
https://pacer.ischool.utexas.edu/bitstream/2081/859/1/Arrangement.doc
to
https://pacer.ischool.utexas.edu/handle/2081/859
Another idea would be to link any incorrect bitstream to the metadata.
Instead of sending :
Invalid Identifier
The identifier 2081/859/1/Arransqdsqd does not correspond to a valid
Bitstream in DSpace
we could simply show the metadata page. So any corruption to the end of
the URL would help the user by showing the metadata record.
Especially:
https://pacer.ischool.utexas.edu/bitstream/2081/859/1/
and
https://pacer.ischool.utexas.edu/bitstream/2081/859/
should return the metadata as it is usual to remove level to an URL to get an upper level Table of Content.
May be this (removing a level to an URL) could be a standard "URL based" way to go from a document to its context.
But, I know, Index is not Metadata ! To be implemented by Google? For DSpace?
For other sites, as a default generic method to contextualize a document???
Have a nice evening,
Christophe Dupriez
P.S. Mr.Costello site, XFront.com, seems packed up with interesting XSLT tutorials...
Pat Galloway a écrit :
> After having read the original post on this topic I was more than a
> little concerned because we have restricted materials on our server; but
> I tested using text in restricted Microsoft Word files and found no
> hits; the same day, however, I retrieved on this string "Michael
> Joyce—Arrangement-4/20/2005" which was the first text string in an
> unrestricted Word document (but not in the DSPace metadata), and got
> this result using regular Google search:
>
> [DOC] Michael Joyce--ArrangementFile Format: Microsoft Word - View as HTML
> Michael Joyce—Arrangement-4/20/2005 (updated 05/01/2005). Series I.
> Works. (Subseries for each title). Series II. Academic Career ...
> https://pacer.ischool.utexas.edu/bitstream/2081/859/1/Arrangement.doc
>
> Obviously if you go here it is impossible to get to the metadata unless
> you know to sever the last two filepath elements. Clearly this might be
> a concern for many reasons; but I was in the midst of writing an article
> for Library Trends on archives and information retrieval, and found it
> interesting (in the context of archival discussions about "exploding"
> the authority of the finding aid) that opening DSpace to Web 2.0 permits
> this broader, deracinated granular access. I would hope that it NOT be
> made impossible, either on the DSpace or the Google side; just that
> (since Google is DSpace-aware) the searcher be shown how to get the
> metadata if indeed it is wanted for the searcher's purposes. Too often
> we think it's up to us to determine what researchers need and want. Has
> anyone heard any complaints from them?
>
> Pat Galloway
> School of Information
> UNiversity of Texas
>
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20070622/78e733f8/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: christophe.dupriez.vcf
Type: text/x-vcard
Size: 454 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20070622/78e733f8/attachment.vcf
More information about the Dspace-general
mailing list