[Dspace-general] Statistics

Mark Diggory mdiggory at MIT.EDU
Fri Aug 29 13:32:33 EDT 2008


Thomas,

Thanks for what is also a sensible recommendation.

On Aug 29, 2008, at 10:01 AM, Thomas A McGee wrote:

>
> I missed the chat the other day, so some of this may have been  
> covered and dismissed already.
>
> Tomcat has the capacity to output Apache-style "combined" log files  
> for all requests, including bitstreams. There's a whole host of  
> commercial, shareware and freeware products out there designed to  
> slice-and-dice these Apache log files and pull out all the kinds of  
> reports people seem to be talking about here.
>
> The programs range from the very simple, like Analog, to the  
> extremely complex and expensive, like WebTrends Enterprise. They  
> can be configured to download the log files automatically and run  
> reports on a schedule, so that they're there when you come in in  
> the morning. They can incorporate various filters, resolve user IP  
> addresses, analyze request URL paths (which can be translated into  
> collection and community names), referers, logged-in users, user  
> agents, etc. etc.
>
> Rather than reinvent the wheel (and this is an extremely complex  
> wheel),I think for most users it would pay to look at this approach  
> unless there is something really esoteric about your traffic that  
> you are trying to get at.


Its an inherent issue in the the "address space" of DSpace resources  
made available in the web-application. For instance. I may have the  
following Community, Collection and Item

Computer Science and Artificial Intelligence Lab (CSAIL)
http://dspace.mit.edu/handle/1721.1/5458

CSAIL Technical Reports (July 1, 2003 - present)
http://dspace.mit.edu/handle/1721.1/29807

Adaptive Envelope MDPs for Relational Equivalence-based Planning
http://dspace.mit.edu/handle/1721.1/41920

Via the perception of the Apache/Tomcat logs Requests to these  
resources are made and based on those logs its quite difficult to  
ascertain that there is a hierarchy here:

/1721.1/5458 <-- Community
       /1721.1/29807 <-- Collection
               /1721.1/41920 <-- Item

The challenge is that most logging packages given the lack of the  
above structure being absent in the path of the resource, cannot roll  
up the statistics to represent the aggregations at the collection and  
item level that Managers want to see for a DSpace Community/Collection.

Likewise, we are in a situation where we are trying to maintain

1.) Not introducing a ridged expectation that "paths" for which  
resources are represented can not change over time as dspace evolves
2.) That we may have more than one path for which a resource is  
accessed, and may want to either treat those accesses as "the same"  
or treat them as "uniquely different" statistically.
3.) That we want to allow hooks so that these stats can be collected  
off the "logical event" in DSpace rather than the "physical event" in  
the application server.

By configuring a stats solution like analog/awstats/webtrends, we are  
restricted to only gathering statistics about the physical event of  
requesting that address in the web service. And likewise, if that  
address representing that resource changes in UI (either via  
development decisions or administrative decisions) then that  
configuration of that external software will be out of sync and need  
to be adjusted.  By having the application report "logical events" we  
can step away from this issue. By internalizing the statistics  
gathering and generation, we have an opportunity to create a solution  
that can allow DSpace to freely evolve and  solution that will meet  
the requirements requested by the community (or more explicitly,  
exhibited by the Minho addon).

Cheers,
Mark

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080829/7cf2a0ea/attachment.htm


More information about the Dspace-general mailing list