[Dspace-general] Google Scholar & DSpace Google

MacKenzie Smith kenzie at MIT.EDU
Mon Jan 17 13:38:16 EST 2005


You raise a good point Richard, but one that neither the Google DSpace 
pilot nor Google Scholar has addressed yet.
I will bring this up with them as an important issue for scholars citing 
material found via Google Scholar and see
what they say, but I don't think it affects a decision to continue with the 
DSpace pilot or not...

So I think the concensus was to drop the limited DSpace pilot and put our 
collectively energy into trying to make Google Scholar into the best 
possible service...
I will start a separate thread some time soon (and anyone else should feel 
welcome to do it sooner) about how to improve Google Scholar beyond
a) harvesting DSpace repositories faster
b) identifying OA vs limited access or commercial content in results
c) thinking about which URL to present in the results for long-term citability.

Another question that came up with Google was our own limited access 
content (for example, MIT doesn't make
printable PDFs of our e-theses freely available through DSpace, and the MIT 
Press titles are also restricted).
Google is interested in including such content in their crawl of DSpace 
repositories with clear terms and conditions
to prevent them from making the harvested text available via Google (like 
what they're doing with Google Print).
Is there anyone out there with limited access content who might want to 
talk about this with Google?

MacKenzie


At 09:13 AM 1/14/2005 +0000, Richard Jones wrote:
>Hi,
>
>We were not directly involved in the Google Pilot, but I was
>peripherally aware that one of the issues that might be addressed is
>that of ensuring the persistent identifier for trawled items be the URL
>exposed via the Google search results, and not the local, potentially
>unstable, URL.  That is, links in Google search results are
>http://hdl.handle.net/12345/6789 not
>http://www.myir.ac.uk/handle/12345/6789.  Was this requirement/desire
>dropped from the spec or will Google Scholar address this issue?
>
>Cheers,
>
>Richard
>-------
>Richard Jones
>Information Systems Developer  + A crash reduces
>Edinburgh University Library   + your expensive computer
>Information Systems            + to a simple stone
>
>e: r.d.jones at ed.ac.uk
>t: 0131 651 3811
>
>Edinburgh Research Archive: http://www.era.lib.ed.ac.uk/
>Tapir on SourceForge: http://sourceforge.net/projects/tapir-eul
>Theses Alive! homepage: http://www.thesesalive.ac.uk/
>
>
> > -----Original Message-----
> > From: dspace-general-bounces at MIT.EDU
> > [mailto:dspace-general-bounces at MIT.EDU] On Behalf Of Ann Lally
> > Sent: 13 January 2005 20:57
> > To: 'MacKenzie Smith'; dspace-google-pilot at MIT.EDU;
> > dspace-general at MIT.EDU
> > Subject: RE: [Dspace-general] Google Scholar & DSpace Google
> >
> >
> > I can't think of any compelling reason to continue the pilot.
> >
> > Ann
> >
> > -----Original Message-----
> > From: MacKenzie Smith [mailto:kenzie at MIT.EDU]
> > Sent: Thursday, January 13, 2005 9:29 AM
> > To: Martin Courtois; dspace-google-pilot at MIT.EDU;
> > dspace-general at MIT.EDU
> > Subject: Re: [Dspace-general] Google Scholar & DSpace Google
> >
> >
> > >Is the plan that only Google Scholar will contain content
> > from DSpace
> > >sites or will DSpace content also be available by searching
> > "regular"
> > >Google?
> >
> > To the best of my knowledge, Google.com ("big google") will
> > continue to
> > harvest the entire web, including all the DSpace sites (both
> > metadata and
> > content).
> > Google Scholar will harvest all the DSpace repositories that
> > they know
> > about, but they currently aren't indexing the metadata, just
> > the content. So both.
> >
> > >I don't know much about DSpace Google, but it sounds like
> > the idea was
> > >for Google to take steps _not_ to crawl DSpace sites other
> > than the 17
> > >participating institutions?
> >
> > It was a pilot project to harvest sites that opted in -- 17
> > of the possible
> > 70 or so.
> > We do have the option of asking them to continue the pilot
> > with a larger
> > set of DSpace repositories included.
> > But unlike google.com or Google Scholar you do need to *opt
> > in*. The DSpace Google thing was a search restrictor in
> > google.com that would
> > limit results to content from the
> > 17 pilot participants.
> >
> > Hope that's a bit clearer.
> >
> > MacKenzie
> >
> >
> >
> > MacKenzie Smith
> > Associate Director for Technology
> > MIT Libraries
> > Building E25-131d
> > 77 Massachusetts Avenue
> > Cambridge, MA  02139
> > (617)253-8184
> > kenzie at mit.edu
> >
> >
> >
> > _______________________________________________
> > Dspace-general mailing list
> > Dspace-general at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/dspace> -general
> >

MacKenzie Smith
Associate Director for Technology
MIT Libraries
Building E25-131d
77 Massachusetts Avenue
Cambridge, MA  02139
(617)253-8184
kenzie at mit.edu 



More information about the Dspace-general mailing list