[Dspace-general] [Dspace-tech] Problem with DSpace 1.5, 1.5.1 prevents indexing by search engines

Bram Luyten bluyten at gmail.com
Wed Feb 25 04:00:28 EST 2009


great, thanks Rob,

I already tried with site:dspace.mit.edu/handle , basically the same as
"inurl:handle", but show=full can make indeed the difference between
community//collection pages and item pages.

best regards,

Bram

@mire NV
Romeinse Straat 18
3001 Heverlee
Belgium
+32 2 888 29 56

http://www.atmire.com - Institutional Repository Solutions
http://www.togather.eu - Before getting together, get Tog at ther


On Thu, Feb 19, 2009 at 6:04 PM, Robert Tansley <roberttansley at google.com>wrote:

> You won't get entirely accurate numbers but you can get ballpark figures
> with e.g.
>
> site:dspace.mit.edu inurl:handle inurl:show=full
>
> Basically this narrows things down to the "full item record" pages. Looks
> like there may be dups in there -- you could try some additional conditions.
>
> For the number of bitstreams:
>
> site:dspace.mit.edu inurl:bitstream
>
> Hope this helps
>
> Rob
>
>
> On Thu, Feb 19, 2009 at 05:47, Bram Luyten <bluyten at gmail.com> wrote:
>
>> Hi Rob,
>>
>> I had a question somewhat related to robots.txt and they way how DSpace
>> instances are being indexed by google.
>>
>> As a part of the Google analytics - DSpace comparison that I've been
>> running, I would like to analyse which repositories are being indexed best
>> by Google, and how that impacts their number of visits.
>>
>> As a first, very rough estimate, I searched for:
>>
>> "site:<<repository url>>" to get an indication of how many useful pages
>> were indexed. It was interesting to see that these numbers did not really
>> corellate with visits to this repository.
>> I assumed that for many repositories, different browse pages were being
>> indexed, and that these indexed pages were not very useful to generate
>> visits // expose the content.
>>
>> In a second step, I tried to look for "site:<<repository url>>" -browse".
>> The returned numbers were in some cases even less than half of the original
>> number.
>> But I realise this search is being too restrictive: because many pages
>> include the word "browse" in their navigation bar, I'm probably excluding
>> useful item pages etc in the search.
>>
>> So my question is the following:
>> which search query could I use in Google, to get the number of useful
>> indexed pages in Google (item pages, bitstreams, collection & community
>> pages) ?
>>
>> Already an interesting finding from my research:
>> the 15 repositories already included in the research, get 60% of their
>> visits through search engines (average calculated on the visits in december
>> 2008). So even more reason to get exposure through search engines as
>> optimized as possible.
>>
>> best regards,
>>
>> Bram
>>
>> @mire NV
>> Romeinse Straat 18
>> 3001 Heverlee
>> Belgium
>> +32 2 888 29 56
>>
>> http://www.atmire.com - Institutional Repository Solutions
>> http://www.togather.eu - Before getting together, get Tog at ther
>>
>>
>> On Thu, Feb 5, 2009 at 10:21 PM, Robert Tansley <roberttansley at google.com
>> > wrote:
>>
>>> To all users of DSpace 1.5 and DSpace 1.5.1:
>>> These versions of DSpace ship with a bad robots.txt file that prevents
>>> search engines such as Google Scholar or Yahoo from indexing any content on
>>> a DSpace site. To check if this applies to you:
>>> - Visit your site's robots.txt --
>>> http://your_dspace_hostname.edu/robots.txt
>>> - If you see the following line you have a bad robots.txt:
>>>
>>> Disallow: /browse
>>>
>>> It is important that you REMOVE this line from your robots.txt to ensure
>>> that your DSpace instance is correctly indexed by search engines. More info
>>> on ensuring your DSpace site is correctly indexed here:
>>>
>>> http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed
>>>
>>> Robert Tansley / Google
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Create and Deploy Rich Internet Apps outside the browser with
>>> Adobe(R)AIR(TM)
>>> software. With Adobe AIR, Ajax developers can use existing skills and
>>> code to
>>> build responsive, highly engaging applications that combine the power of
>>> local
>>> resources and data with the reach of the web. Download the Adobe AIR SDK
>>> and
>>> Ajax docs to start building applications today-
>>> http://p.sf.net/sfu/adobe-com
>>> _______________________________________________
>>> DSpace-tech mailing list
>>> DSpace-tech at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20090225/8a76cca1/attachment.htm


More information about the Dspace-general mailing list