[Dspace-general] Week 2: Statistics

Wed Aug 27 13:57:58 EDT 2008

One useful distinction is to separate to some degree the statistics that we 
may want to calculate from the events/raw data that needs to be recorded by 
the DSpace system as it operates. As long as the events are recorded in the 
database (preferably *not* logged in files), various computations, 
aggregations, reports, and APIs for exposing that data can be generated 
later. So we may want to focus initially on what data to record and plan 
for a statistics data model, database tables, and recording to be built 
into DSpace 2.0.

At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote:
>On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote:
> > 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
>[snip]
> > This is such an interesting statement that I think I will make it next
> > week's topic! What *is* excellent document repository software? I have
> > a feeling that the non-developer community may have a rather different
> > take on it from most developers... we'll see if I'm right.
>
>I think you are, and I look forward to that discussion!
>
> > > This is one reason why I think that it should be as easy as possible
> > > for multiple stat. projects to tap into built-in streams of
> > > observations.  Different sites have different needs, and I think we
> > > need to be able to easily play with various ways of doing stat.s.
> >
> > Agreed, but just to toss this out: I foresee a countervailing pressure
> > in future toward standardized and aggregated statistics across
> > repositories. I have heard a number of statements to the effect that
> > faculty are using download counts from disciplinary repositories in
> > tenure-and-promotion packages. As their work becomes scattered and/or
> > duplicated across various repositories, they're going to want to
> > aggregate that information.
>
>Quite so.  I just don't feel that we've yet got to the point at which
>we understand how to do that well.  A lot of good solutions come about
>in this way: an abstract and somewhat indistinct common need is
>recognized; a number of people all go off in different directions and
>try things; solutions are compared, borrow from each other, coalesce;
>finally a now well-understood need finds itself fulfilled with one or
>two mature implementations.  I feel that we're still deep in the "try
>things" phase.
>
>The degree to which statistics are desired and used suggests that, in
>addition to traditional reports, we should be thinking in terms of
>exposing statistical products in machine-readable form.  I have been
>thinking for some time that we might, with reasonable effort, help to
>work out a lingua franca for exchanging usage statistics among
>repositories of various "brands" so that the utility of various ideas,
>and the behavior of repository users, might be studied more
>effectively.  But again, what we can all agree on will very likely be
>a small subset of what we can individually envision.
>
>This really ought to be considered early-on, because if we can come up
>with a common theme in the abstract, then machine- and human-readable
>reporting become side-by-side layers on top of the pool of statistical
>data products, and both will be easier to think about if they are
>merely formatting something already produced.  Likewise the production
>of those stat.s will be easier to think about if presentation issues
>can be separated from the task.
>
>I do *not* mean to say here that the statistics that people want now
>should have to wait indefinitely on some Grand Scheme to do it all.
>It would be better to organize the development in successive
>approximations if it looks like taking too long to do it all in one
>push.  It's probably going to take several years to fully realize
>satisfactory monitoring and reporting of DSpace usage, but that
>doesn't mean that we can't provide better and better approximations
>much sooner.
>
>--
>Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
>Typically when a software vendor says that a product is "intuitive" he
>means the exact opposite.
>
>
>_______________________________________________
>Dspace-general mailing list
>Dspace-general at mit.edu
>http://mailman.mit.edu/mailman/listinfo/dspace-general

Randy Stern
Manager of Systems Development
Harvard University Library Office for Information Systems
90 Mount Auburn Street
Cambridge, MA 02138
Tel. +1 (617) 495-3724
Email <randy_stern at harvard.edu>