[Dspace-general] Week 2: Statistics

Wed Aug 27 09:46:54 EDT 2008

On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote:
> 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
[snip]
> This is such an interesting statement that I think I will make it next
> week's topic! What *is* excellent document repository software? I have
> a feeling that the non-developer community may have a rather different
> take on it from most developers... we'll see if I'm right.

I think you are, and I look forward to that discussion!

> > This is one reason why I think that it should be as easy as possible
> > for multiple stat. projects to tap into built-in streams of
> > observations.  Different sites have different needs, and I think we
> > need to be able to easily play with various ways of doing stat.s.
> 
> Agreed, but just to toss this out: I foresee a countervailing pressure
> in future toward standardized and aggregated statistics across
> repositories. I have heard a number of statements to the effect that
> faculty are using download counts from disciplinary repositories in
> tenure-and-promotion packages. As their work becomes scattered and/or
> duplicated across various repositories, they're going to want to
> aggregate that information.

Quite so.  I just don't feel that we've yet got to the point at which
we understand how to do that well.  A lot of good solutions come about
in this way: an abstract and somewhat indistinct common need is
recognized; a number of people all go off in different directions and
try things; solutions are compared, borrow from each other, coalesce;
finally a now well-understood need finds itself fulfilled with one or
two mature implementations.  I feel that we're still deep in the "try
things" phase.

The degree to which statistics are desired and used suggests that, in
addition to traditional reports, we should be thinking in terms of
exposing statistical products in machine-readable form.  I have been
thinking for some time that we might, with reasonable effort, help to
work out a lingua franca for exchanging usage statistics among
repositories of various "brands" so that the utility of various ideas,
and the behavior of repository users, might be studied more
effectively.  But again, what we can all agree on will very likely be
a small subset of what we can individually envision.

This really ought to be considered early-on, because if we can come up
with a common theme in the abstract, then machine- and human-readable
reporting become side-by-side layers on top of the pool of statistical
data products, and both will be easier to think about if they are
merely formatting something already produced.  Likewise the production
of those stat.s will be easier to think about if presentation issues
can be separated from the task.

I do *not* mean to say here that the statistics that people want now
should have to wait indefinitely on some Grand Scheme to do it all.
It would be better to organize the development in successive
approximations if it looks like taking too long to do it all in one
push.  It's probably going to take several years to fully realize
satisfactory monitoring and reporting of DSpace usage, but that
doesn't mean that we can't provide better and better approximations
much sooner.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080827/97f4755c/attachment.bin