[Dspace-general] Statistics

Scott Yeadon scott.yeadon at anu.edu.au
Wed Aug 27 18:40:11 EDT 2008


Hi,

While jumping ahead a bit and not completely relevant to the context of 
this discussion, it's important in any solution to separate out event 
capture and statistics. Web server level statistics will only get you so 
far. Having recently been through an exercise in building a prototype 
statistics aggregator, the fundamentals in producing "good" statistics 
(i.e. the reported information) is the *targetted capture* of events 
(i.e. the raw event data) typically by the application (i.e. in the 
DSpace code). We found the majority of reports which people want (or 
rather the accuracy and granularity thereof) can only be provided where 
the application has captured the event information rather than the more 
general-level web container app. If you couple the DSpace 1.5.x event 
producer/consumer feature with something like the De Minho front-end or 
a Manakin stats aspect, that would make a pretty neat default stats package.

Scott.

dspace-general-request at mit.edu wrote:
> Message: 1
> Date: Tue, 26 Aug 2008 11:09:15 -0500
> From: Tim Donohue <tdonohue at illinois.edu>
> Subject: Re: [Dspace-general] Week 2: Statistics
> To: Dorothea Salo <dsalo at library.wisc.edu>
> Cc: dspace-general at mit.edu
> Message-ID: <48B42AAB.6010804 at illinois.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dorothea & all,
>
> Dorothea Salo wrote:
>   
>> 2008/8/25 Mark H. Wood <mwood at iupui.edu>:
>>     
>>> One thing to keep in mind about whole-site statistical tables is that
>>> there are already tools to do this for web sites in general, such as
>>> AWStats or Webalizer or whatever your favorite may be.  We probably
>>> should not spend effort to try to duplicate those.
>>>       
>> Perhaps not, but if this is the direction we want people to go in, we
>> probably ought to document how to do it, at least informally on the
>> wiki. Does anybody have such a system in place?
>>     
>
> For IDEALS (www.ideals.uiuc.edu), we use AWStats to get site-wide 
> traffic information.  However, that information is *not* publicly 
> accessible.  We only use it for administrative purposes, since most of 
> the information AWStats generates for us is generally *not* useful to 
> our users.
>
> So, for example, AWStats can provide us with the following general 
> information:
>    * Which features of DSpace are being used most frequently (e.g. 
> Subject Browse, Community/Collection browse, search, etc.)
>    * Which web browsers our users are using
>    * # of overall hits in a given month,week,day,hour
>    * Approximate amount of time users spend on our site
>    * What external resources people use to get to our site (e.g. Google, 
> Blog posts, Library website, etc.)
>    * The top searches used to get to your site (in Google, Yahoo, MSN, etc)
>
> But, AWStats only works at a global level.  So, it *cannot* give us any 
> real information at a community, collection or item level, since it 
> doesn't understand DSpace's internal structure and cannot parse DSpace's 
> log files (it parses the *web server* log files, rather than DSpace's 
> internal logs)
>
> So, in the end, AWStats is a worthwhile tool to keep in mind.  However, 
> without some major customizations specific to DSpace, it's really more 
> of an Administrative tool to help you determine *how* users are using 
> your site.  It doesn't give any real worthwhile "statistics" in terms of 
> file downloads or individual community/collection access counts, which 
> are more likely to be useful to your users.
>
> - Tim
>
>   




More information about the Dspace-general mailing list