[Dspace-general] Statistics
Mark Diggory
mdiggory at MIT.EDU
Fri Aug 29 13:32:33 EDT 2008
Thomas,
Thanks for what is also a sensible recommendation.
On Aug 29, 2008, at 10:01 AM, Thomas A McGee wrote:
>
> I missed the chat the other day, so some of this may have been
> covered and dismissed already.
>
> Tomcat has the capacity to output Apache-style "combined" log files
> for all requests, including bitstreams. There's a whole host of
> commercial, shareware and freeware products out there designed to
> slice-and-dice these Apache log files and pull out all the kinds of
> reports people seem to be talking about here.
>
> The programs range from the very simple, like Analog, to the
> extremely complex and expensive, like WebTrends Enterprise. They
> can be configured to download the log files automatically and run
> reports on a schedule, so that they're there when you come in in
> the morning. They can incorporate various filters, resolve user IP
> addresses, analyze request URL paths (which can be translated into
> collection and community names), referers, logged-in users, user
> agents, etc. etc.
>
> Rather than reinvent the wheel (and this is an extremely complex
> wheel),I think for most users it would pay to look at this approach
> unless there is something really esoteric about your traffic that
> you are trying to get at.
Its an inherent issue in the the "address space" of DSpace resources
made available in the web-application. For instance. I may have the
following Community, Collection and Item
Computer Science and Artificial Intelligence Lab (CSAIL)
http://dspace.mit.edu/handle/1721.1/5458
CSAIL Technical Reports (July 1, 2003 - present)
http://dspace.mit.edu/handle/1721.1/29807
Adaptive Envelope MDPs for Relational Equivalence-based Planning
http://dspace.mit.edu/handle/1721.1/41920
Via the perception of the Apache/Tomcat logs Requests to these
resources are made and based on those logs its quite difficult to
ascertain that there is a hierarchy here:
/1721.1/5458 <-- Community
/1721.1/29807 <-- Collection
/1721.1/41920 <-- Item
The challenge is that most logging packages given the lack of the
above structure being absent in the path of the resource, cannot roll
up the statistics to represent the aggregations at the collection and
item level that Managers want to see for a DSpace Community/Collection.
Likewise, we are in a situation where we are trying to maintain
1.) Not introducing a ridged expectation that "paths" for which
resources are represented can not change over time as dspace evolves
2.) That we may have more than one path for which a resource is
accessed, and may want to either treat those accesses as "the same"
or treat them as "uniquely different" statistically.
3.) That we want to allow hooks so that these stats can be collected
off the "logical event" in DSpace rather than the "physical event" in
the application server.
By configuring a stats solution like analog/awstats/webtrends, we are
restricted to only gathering statistics about the physical event of
requesting that address in the web service. And likewise, if that
address representing that resource changes in UI (either via
development decisions or administrative decisions) then that
configuration of that external software will be out of sync and need
to be adjusted. By having the application report "logical events" we
can step away from this issue. By internalizing the statistics
gathering and generation, we have an opportunity to create a solution
that can allow DSpace to freely evolve and solution that will meet
the requirements requested by the community (or more explicitly,
exhibited by the Minho addon).
Cheers,
Mark
~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080829/7cf2a0ea/attachment.htm
More information about the Dspace-general
mailing list