[Dspace-general] Week 2: Statistics

Thu Aug 28 04:39:21 EDT 2008

without getting into whether event streams should be logged to file or 
database, this is probably in general the way to go. though i would 
recommend that this is done on a broader scale so analysis tools are 
interoperable among the major repository software systems.

(there was some research on an XML log file format a while back but it 
did not go far)

ttfn,
----hussein

=====================================================================
hussein suleman ~ hussein at cs.uct.ac.za ~ http://www.husseinsspace.com
=====================================================================

Randy Stern wrote:
> One useful distinction is to separate to some degree the statistics that we 
> may want to calculate from the events/raw data that needs to be recorded by 
> the DSpace system as it operates. As long as the events are recorded in the 
> database (preferably *not* logged in files), various computations, 
> aggregations, reports, and APIs for exposing that data can be generated 
> later. So we may want to focus initially on what data to record and plan 
> for a statistics data model, database tables, and recording to be built 
> into DSpace 2.0.
> 
> At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote:
>> On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote:
>>> 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
>> [snip]
>>> This is such an interesting statement that I think I will make it next
>>> week's topic! What *is* excellent document repository software? I have
>>> a feeling that the non-developer community may have a rather different
>>> take on it from most developers... we'll see if I'm right.
>> I think you are, and I look forward to that discussion!
>>
>>>> This is one reason why I think that it should be as easy as possible
>>>> for multiple stat. projects to tap into built-in streams of
>>>> observations.  Different sites have different needs, and I think we
>>>> need to be able to easily play with various ways of doing stat.s.
>>> Agreed, but just to toss this out: I foresee a countervailing pressure
>>> in future toward standardized and aggregated statistics across
>>> repositories. I have heard a number of statements to the effect that
>>> faculty are using download counts from disciplinary repositories in
>>> tenure-and-promotion packages. As their work becomes scattered and/or
>>> duplicated across various repositories, they're going to want to
>>> aggregate that information.
>> Quite so.  I just don't feel that we've yet got to the point at which
>> we understand how to do that well.  A lot of good solutions come about
>> in this way: an abstract and somewhat indistinct common need is
>> recognized; a number of people all go off in different directions and
>> try things; solutions are compared, borrow from each other, coalesce;
>> finally a now well-understood need finds itself fulfilled with one or
>> two mature implementations.  I feel that we're still deep in the "try
>> things" phase.
>>
>> The degree to which statistics are desired and used suggests that, in
>> addition to traditional reports, we should be thinking in terms of
>> exposing statistical products in machine-readable form.  I have been
>> thinking for some time that we might, with reasonable effort, help to
>> work out a lingua franca for exchanging usage statistics among
>> repositories of various "brands" so that the utility of various ideas,
>> and the behavior of repository users, might be studied more
>> effectively.  But again, what we can all agree on will very likely be
>> a small subset of what we can individually envision.
>>
>> This really ought to be considered early-on, because if we can come up
>> with a common theme in the abstract, then machine- and human-readable
>> reporting become side-by-side layers on top of the pool of statistical
>> data products, and both will be easier to think about if they are
>> merely formatting something already produced.  Likewise the production
>> of those stat.s will be easier to think about if presentation issues
>> can be separated from the task.
>>
>> I do *not* mean to say here that the statistics that people want now
>> should have to wait indefinitely on some Grand Scheme to do it all.
>> It would be better to organize the development in successive
>> approximations if it looks like taking too long to do it all in one
>> push.  It's probably going to take several years to fully realize
>> satisfactory monitoring and reporting of DSpace usage, but that
>> doesn't mean that we can't provide better and better approximations
>> much sooner.
>>
>> --
>> Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
>> Typically when a software vendor says that a product is "intuitive" he
>> means the exact opposite.
>>
>>
>> _______________________________________________
>> Dspace-general mailing list
>> Dspace-general at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/dspace-general
> 
> 
> Randy Stern
> Manager of Systems Development
> Harvard University Library Office for Information Systems
> 90 Mount Auburn Street
> Cambridge, MA 02138
> Tel. +1 (617) 495-3724
> Email <randy_stern at harvard.edu>
> 
> 
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general