[Dspace-general] Week 3: Good Repository Software

Graham Triggs graham at biomedcentral.com
Thu Sep 4 11:33:14 EDT 2008


Dorothea Salo wrote:
> BLUNTNESS ALERT. This is not a happy email. I have tried to make it a
> reasonably diplomatic one, but I may well have failed,

At least I know it's going to be entertaining! Well, my warning shall be 
that this isn't intended to be a rebuttal, more counterpoint. It's great 
that all this has been said, but if you take [another] step back, does 
it still look the same?

> Axiom 1. The most elegant, stable, reliable, preservation-ready,
> standards-compliant, easy-to-install, easy-to-maintain repository
> software package in the WORLD is completely useless -- worse,
> indefensible -- if it does not attract deposits.
> 
> Do not tell me "that's a political problem" or "that's a
> marketing/outreach problem." In part it is, but the politics at my
> institutions are not lining up behind it, and marketing and outreach
> are useless without a compelling value proposition. It certainly
> doesn't help matters that the repository I run has been an albatross
> thus far, in considerable part due to DSpace's limitations (usage
> reports, anyone?).

I'm going to bite on this one, as I want to ask a serious question - 
should usage reports play a part in encouraging deposits?

Would seeing low usage reports DIScourage people that would otherwise 
choose to submit? If you are looking at counts of accesses and 
downloads, how reliable can they ever be?

My personal opinion is that usage reports at an item/bitstream level is 
something of an itch you can't scratch - if you try, it might ease an 
initial sore point, only for it to recur in another guise later on. 
Those kinds of usage reports are better suited at a high level 'is this 
repository wanted' question, and things like Google Analytics answer 
that far better than anything we could ever build in.

Value at a finer level would (again, imho) be better accounted for in 
MESURes like citation counts. Is that really something that can or 
should be built in to DSpace? (it could just as easily be an entirely 
separate system that a DSpace installation could query to obtain the 
relevant information for display).

As I say, the above is just a personal take, so people are free to 
disagree. I just want to shake things up a little and see if people have 
thought about the problem from different angles [than just saying 
'statistics in DSpace'].

> This answer, of course, begs a question: what *are* faculty's data-
> and publication-management problems? Here are some problems that
> faculty at the institutions I serve have admitted to:
> 
> - collaborating on unfinished work across institutional boundaries,
> securely and easily

An interesting use case, but not necessarily one that can be solved by 
an institutional repository, or ever should be solved by something that 
is set up for preservation. (I'm deliberately leaving that point for 
further exapnsion later).

If you are wanting to collaboratively edit a document, for example, 
would the better answer be to use Google Docs? *That* level of 
collaborative ability is way off the scope for what we could ever hope 
to put in to a repository. Rather, the question should be how can we 
better support external collaboration tools - ie. easy ingesting of a 
Google Doc into a repository.

> - storing and maintaining substantial amounts of data (in highly
> heterogeneous forms) and writing, both while projects are underway and
> afterwards

I rather covered the editing side above, but there may be a lot of 
[relatively] static data that is associated with a paper. It may not 
need to be updated, but it does need to be stored somewhere - and if it 
is eventually going to be in the repository, why not have it there from 
the start rather than managing it until the point of submission? It's an 
interesting point.

> - loading their data from their software and their servers into a safe
> storage place, with as little manual intervention as possible,
> preferably none

I see that as being quite related to the above - as much of a users 
output should be captured [seamlessly] as part of their ongoing work.

> None of this should be surprising; the data-curation literature is
> full of these and similar problems. I am leaving digitization support
> out of the picture, not because it isn't important (it is!), but
> because it's a problem DSpace can't feasibly solve -- it's *genuinely*
> a political problem. Still and all, it's worlds harder to solve this
> political problem when DSpace's limitations leave me with a
> credibility deficit to overcome.
> 
> The problem that DSpace was designed for -- self-archiving of
> peer-reviewed journal articles in an institution-based repository for
> purposes of open access -- does not appear on the above list. Bluntly,
> this is because faculty do not perceive self-archiving as a problem
> they have or wish to solve. At the moment, there are two institutions
> in the United States that are entitled to say that some of their
> faculty think otherwise: Harvard and Stanford. DSpace presumably
> wishes to appeal to more than two institutions!

I have sympathy for your plight. But I think (not necessarily in you, I 
hasten to add ;) that there may be some element of not accepting 
problems as political ones, because frankly their are only so many of 
these battles that we can take on (and win), and because we can point to 
their possibly being a technical solution to a limited selection of 
cases that have been encountered to date.

I'm not saying that to have an argument, but from the other side there 
are only so many technical problems that can be taken on and solved in a 
certain period of time. If one of these problems could be tackled 
politically, would that mean our time would be better spent solving 
other issues?

> - Bitstream-less items.

that's already been done ;) (all developers down the pub then, right?)

> Somewhat more difficult fixes that would have great impact:
> 
> - File versioning.

I agree about it being difficult. There was GSoC code for this that will 
make it into a future release.

Although I'll make my obligatory statement about the need for this may 
be exaggerated - there are practises a repository can adopt for managing 
it's items that would alleviate a number of potential cases, and in 
relation to the points above about collaboration - there are better ways 
to address the problem *before* hitting the repository.

> - Elimination of per-item licensing, replaced with a single Terms of
> Service click-through. (I can elaborate on this if desired.)

I agree with the sentiment. Lot's of issues to think about though in the 
wider scope - ie. how do you deal with updating the Terms of Service.

> - Streamlining and simplification of the deposit process, including
> accepting incomplete deposits (even just a file!) for later
> inspection/revision/management by a third party.

A lot of this could already be achieved solely through the configuration 
files - and maybe this could be aided by one or two 'non-interactive' 
submission steps being provided in the default DSpace, ready for 
configuring.

> - Better display options for a broader variety of content. I need a
> page-turner, an image browser, and a journal browser that behaves like
> a journal browser -- and that's just for the content I *currently
> have*, not the content I foresee wanting to cope with in future.

Isn't this about 90% outside of the scope of DSpace. A page turner? - 
that could just be a flash object that you give the url of a PDF to. If 
there is something out there already that offers that, it's fairly 
minimal effort for someone to customise their interface to use it - you 
don't need to get 'inside' DSpace.

I can see why visualizations are useful, but that isn't a reason for 
DSpace itself to do anything more than make it possible to easily 
integrate third party objects. If someone finds or wants to provide such 
objects that can be redistributed with DSpace, that's a bonus.

G
This email has been scanned by Postini.
For more information please visit http://www.postini.com




More information about the Dspace-general mailing list