[Dspace-general] what does Google Scholar search?

MacKenzie Smith kenzie at MIT.EDU
Tue Aug 18 17:03:09 EDT 2009


That is veryy weird... the citation GS is building for those items
(i.e. the green text below the title link) is just plain wrong.

I managed to reproduce the problem for one of our faculty (Hal Abelson)
and it looks like the common link in all the examples is that the source of
the citation is an older document that was scanned to PDF. I'd guess that
GS creates those citations by OCRing the scans, and when it fails there's
some bug that inserts spurious metadata. I can believe they'd do that
rather than use the perfectly good metadata provided by IDEALS :)

MacKenzie

Sarah L. Shreeves wrote:
> Following on to this thread, we've discovered something strange about 
> the way Google Scholar is indexing us and wanted to see whether anyone 
> else had run into this. We have a query in to the Google Scholar folks 
> on this, but it would be useful to know if others are seeing this 
> problem as well.
>
> When I search for my name in Google Scholar, I see articles that I 
> authored (both in IDEALS as well as in other sources), but I'm also 
> seeing things that I didn't author - all of which are coming from 
> IDEALS. In the list quoted below, the first and the last ones are mine 
> but the second and third citations are not. You can also see the same 
> things if you do a search for Allen Renear (on the third page of results 
> you start seeing things for Mourning Dove hunting which he assures me he 
> hasn't authored!) or any author that is in IDEALS.
>
> Tim and I have checked things like the metadata on our end as well as 
> things like the meta tags, but haven't found a problem on our end. Has 
> anyone else run into this?
>
> Thanks-
> Sarah
>
>   
>>       Is “Quality” Metadata “Shareable” Metadata? The Implications of
>>       Local Metadata Practices …
>>       <http://books.google.com/books?hl=en&lr=&id=UzyFfOVvr1kC&oi=fnd&pg=PA223&dq=sarah+shreeves&ots=qKYglr06wz&sig=CQFA_H-XHlvMXUIqsW6FqHabNJo>
>>
>> - ►*ala.org 
>> <http://news.ala.org/ala/mgrps/divs/acrl/events/pdf/shreeves05.pdf> 
>> [PDF] *
>> SL *Shreeves*, EM Knutson, B Stvilia, CL … - Currents and convergence: 
>> navigating the rivers of …, 2005 - books.google.com
>> Is" Quality" Metadata" Shareable" Metadata? The Implications of Local 
>> Metadata
>> Practices for Federated Collections *Sarah* L. *Shreeves*, Timothy W. 
>> Cole, Ellen M.
>> Knutson, Besiki Stvilia, Carole L. Palmer, and Michael B. Tuuidale * ...*
>> Cited by 30 
>> <http://scholar.google.com/scholar?cites=5396972378778170886&hl=en> - 
>> Related articles 
>> <http://scholar.google.com/scholar?q=related:Bvaqu8jl5UoJ:scholar.google.com/&hl=en> 
>> - All 23 versions 
>> <http://scholar.google.com/scholar?cluster=5396972378778170886&hl=en> 
>> <http://scholar.google.com/scholar?cluster=17316662543306459957&hl=en>
>>
>>
>>       *[PDF]* ►The progress of theory in knowledge organization
>>       <http://www.ideals.uiuc.edu/bitstream/handle/2142/8417/librarytrendsv50i3_opt.pdf?sequence=3#page=29>
>>
>> … Riley, J Chapman, SL *Shreeves*, L Akerman, W … - ILLINOIS, 2002 - 
>> ideals.uiuc.edu
>> The Progress of Theory in Knowledge Organization RICHARDP. SMIRAGLIA 
>> ABSTRACT WE
>> UNDERSTAND “THEORY” TO BE A SYSTEM of testable explanatory statements
>> derived from research. In knowledge organization, the genera- tion of 
>> * ...*
>> Cited by 21 
>> <http://scholar.google.com/scholar?cites=6811906844135667360&hl=en> - 
>> Related articles 
>> <http://scholar.google.com/scholar?q=related:oDbCBy_BiF4J:scholar.google.com/&hl=en> 
>> - View as HTML 
>> <http://66.102.1.104/scholar?q=cache:oDbCBy_BiF4J:scholar.google.com/+sarah+shreeves&hl=en> 
>> - BL Direct 
>> <http://direct.bl.uk/research/08/10/RN115567706.html?source=googlescholar> 
>> - All 5 versions 
>> <http://scholar.google.com/scholar?cluster=6811906844135667360&hl=en>
>>
>>
>>       *[PDF]* ►Meta-analysis: the librarian as a member of an
>>       interdisciplinary research team
>>       <http://www.ideals.uiuc.edu/bitstream/handle/2142/8086/librarytrendsv45i2_opt.pdf?sequence=3#page=144>
>>
>> … , J Riley, SL *Shreeves*, CL Palmer, EM … - ILLINOIS, 1996 - 
>> ideals.uiuc.edu
>> Meta-Analysis: The Librarian as a Member of an Interdisciplinary 
>> Research Team
>> JACK T. SMITH, JR. ABSTRACT META-ANALYSIS IS A quantitative 
>> statistical tool for
>> combining research stud- ies with a small study population to achieve 
>> a * ...*
>> Cited by 14 
>> <http://scholar.google.com/scholar?cites=81092658183387194&hl=en> - 
>> Related articles 
>> <http://scholar.google.com/scholar?q=related:OrjpzFcZIAEJ:scholar.google.com/&hl=en> 
>> - View as HTML 
>> <http://66.102.1.104/scholar?q=cache:OrjpzFcZIAEJ:scholar.google.com/+sarah+shreeves&hl=en> 
>> - BL Direct 
>> <http://direct.bl.uk/research/3C/50/RN017344948.html?source=googlescholar> 
>> - All 4 versions 
>> <http://scholar.google.com/scholar?cluster=81092658183387194&hl=en>
>>
>>
>>       Moving towards shareable metadata
>>       <http://www.ideals.uiuc.edu/handle/2142/3624?show=full>
>>
>> - ►*uiuc.edu 
>> <http://www.ideals.uiuc.edu/bitstream/2142/3624/22/Shreeves_ShareableMetadata_FirstMonday.doc.pdf> 
>> [PDF] *
>> SL *Shreeves*, J Riley, L Milewicz - 2006 - ideals.uiuc.edu
>> MOVING TOWARDS SHAREABLE METADATA *Sarah* L. *Shreeves*, University of 
>> Illinois at
>> Urbana-Champaign, direct comments to sshreeve at uiuc.edu *...* ABOUT THE 
>> AUTHORS *Sarah* *Shreeves*
>> is the Coordinator for the University of Illinois at 
>> Urbana-Champaign's *...*
>> Cited by 11 
>> <http://scholar.google.com/scholar?cites=4479162518471757887&hl=en> - 
>> Related articles 
>> <http://scholar.google.com/scholar?q=related:P9DANJQuKT4J:scholar.google.com/&hl=en> 
>> - View as HTML 
>> <http://66.102.1.104/scholar?q=cache:P9DANJQuKT4J:scholar.google.com/+sarah+shreeves&hl=en> 
>> - All 2 versions 
>> <http://scholar.google.com/scholar?cluster=4479162518471757887&hl=en>
>>     
> <http://scholar.google.com/scholar?cluster=4479162518471757887&hl=en>
>
> Robert Tansley wrote:
>   
>> Hello,
>>
>> Google Scholar is more-or-less like regular Google search -- it crawls 
>> the Web-accessible content on your DSpace site, including metadata on 
>> the item display pages, and the full content of any bitstreams that 
>> are publicly accessible. Of course, Scholar does more specialised 
>> indexing than regular google.com <http://google.com> search, such as 
>> analysing citations in the full text and so forth. So the metadata and 
>> full text should both be indexed.
>>
>> You also need to ensure your site can be indexed: see 
>> http://wiki.dspace.org/index.php/Ensuring_your_instance_is_indexed
>>
>> (A web crawl is more useful that OAI-PMH for this, as OAI-PMH tends to 
>> expose only a subset of metadata, and often has no link to the fulltext.)
>>
>> Hope this helps,
>>
>> Rob
>>
>> 2009/8/6 Platt, Alice <a.platt at snhu.edu <mailto:a.platt at snhu.edu>>
>>
>>     I feel like I should be able to find the answer to this, but since
>>     things have changed over the years I feel I should ask the
>>     community to make sure of the answer.
>>
>>     What exactly is Google Scholar searching when it crawls into our
>>     DSpace repositories?
>>
>>     Does it search our metadata?
>>
>>     Can it conduct a full-text search of bitstreams in DSpace?
>>
>>     Thanks in advance for your response.
>>
>>     Alice Platt
>>
>>     Digital Initiatives Librarian
>>
>>     Shapiro Library
>>
>>     Southern New Hampshire University
>>
>>     2500 North River Rd
>>
>>     Manchester, NH 03106
>>
>>     USA




More information about the Dspace-general mailing list