[Dspace-general] Statistics

Tim Donohue tdonohue at illinois.edu
Tue Aug 26 16:29:23 EDT 2008



Dorothea Salo wrote:
> 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
>> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote:
>>> So, although I think it was already mentioned, I'd add as a requirement
>>> for a good Statistics Package:
>>>
>>> * Must filter out web-crawlers in a semi-automated fashion!
>> +1!  Suggestions as to how?
> 
> The site <http://www.user-agents.org/> maintains a list of
> user-agents, classified by type. They have an XML-downloadable version
> at <http://www.user-agents.org/allagents.xml>, as well as an RSS-feed
> updater. Perhaps polling this would be a useful starting point?
> 
> Dorothea
> 

Universidade of Minho's Statistics Add-On for DSpace can do some basic 
automated filtering of web crawlers:

See its list of main features on the DSpace Wiki:

http://wiki.dspace.org/index.php//StatisticsAddOn

(It looks like they determine spiders by how spiders tend to identify 
themselves.  Most "nice" spiders, like Google, will identify themselves 
in a common fashion, e.g. "Googlebot")

Frankly, although our statistics for IDEALS are nice looking...Minho's 
work is much more extensive and offers a greater variety of features 
(from what I've seen/heard of it).  It's just missing our "Top 10 
Downloads" list :)

- Tim



-- 
Tim Donohue
Research Programmer, Illinois Digital Environment for
Access to Learning and Scholarship (IDEALS)
University of Illinois at Urbana-Champaign
tdonohue at illinois.edu | (217) 333-4648



More information about the Dspace-general mailing list