[Dspace-general] Week 3: Good Repository Software

Dorothea Salo dsalo at library.wisc.edu
Tue Sep 2 11:27:39 EDT 2008


BLUNTNESS ALERT. This is not a happy email. I have tried to make it a
reasonably diplomatic one, but I may well have failed, in which case I
apologize in advance.

I'm going to tip my hand here. I am speaking from the perspective of a
repository manager, not a software architect or developer. From that
perspective, I don't think DSpace is good repository software. From
where I'm sitting -- on top of a repository that has not met its
library system's hopes and expectations *and is currently being called
to account for that* -- it's pretty terrible repository software.

Let's start from two axioms:

Axiom 1. The most elegant, stable, reliable, preservation-ready,
standards-compliant, easy-to-install, easy-to-maintain repository
software package in the WORLD is completely useless -- worse,
indefensible -- if it does not attract deposits.

Do not tell me "that's a political problem" or "that's a
marketing/outreach problem." In part it is, but the politics at my
institutions are not lining up behind it, and marketing and outreach
are useless without a compelling value proposition. It certainly
doesn't help matters that the repository I run has been an albatross
thus far, in considerable part due to DSpace's limitations (usage
reports, anyone?). My local credibility and my ability to argue
convincingly for resources and support for the repository have been
heavily tarnished by that. The conclusion is simple: DSpace software
must present a compelling value proposition *all by itself* -- without
hacks, without outside software, without local developer support,
without guarantee of deposit support -- if the repository I run is to
survive at all. I do not believe that this repository is alone in that
respect; far from it. I believe that most people in my shoes in the
United States are living professional lives of (too-)quiet
desperation.

Axiom 2. The abovementioned ideal repository software package will not
attract deposits if it does not solve a problem that depositors (not
libraries, not IT, not administrators, DEPOSITORS) perceive that they
have.

Ergo my answer to this week's question is: Good repository software
solves depositors' data- and publication-management problems. Let me
point out, because I've seen this consistently missed, that if I don't
or can't solve data- and publication-management problems as they come
to me, the people with those problems won't come to me with their
other problems *even if I can actually solve them*, nor will they
recommend the repository to their colleagues. This has happened to me
over and over and *over* again in the three years I've been running
DSpace repositories. I am Sisyphus, and that infernal stone keeps
rolling down the mountain.

This answer, of course, begs a question: what *are* faculty's data-
and publication-management problems? Here are some problems that
faculty at the institutions I serve have admitted to:

- collaborating on unfinished work across institutional boundaries,
securely and easily

- storing and maintaining substantial amounts of data (in highly
heterogeneous forms) and writing, both while projects are underway and
afterwards

- safely storing data that cannot be shown or even hinted at (for a
wide variety of reasons) to people outside a certain group (often the
campus or the university system, but sometimes an ad-hoc group)

- loading their data from their software and their servers into a safe
storage place, with as little manual intervention as possible,
preferably none

- (in some disciplines) coming up with a sustainable data-management
plan to satisfy grant requirements

- dealing with electronic works that they want to save, often works by
third parties such as students; ETDs, of course, but also honors
projects, graduate/undergraduate research journals, and local
publications such as newsletters and working-papers series

- managing their publication record, irrespective of whether they are
permitted to self-archive some or all of it; use cases include annual
reviews, tenure-and-promotion packages, and online presence

- (in some disciplines) coping with funder-mandated requirements for
open access to published work arising from a grant

- dealing with electronic materials requiring preservation arising
from faculty retirements

None of this should be surprising; the data-curation literature is
full of these and similar problems. I am leaving digitization support
out of the picture, not because it isn't important (it is!), but
because it's a problem DSpace can't feasibly solve -- it's *genuinely*
a political problem. Still and all, it's worlds harder to solve this
political problem when DSpace's limitations leave me with a
credibility deficit to overcome.

The problem that DSpace was designed for -- self-archiving of
peer-reviewed journal articles in an institution-based repository for
purposes of open access -- does not appear on the above list. Bluntly,
this is because faculty do not perceive self-archiving as a problem
they have or wish to solve. At the moment, there are two institutions
in the United States that are entitled to say that some of their
faculty think otherwise: Harvard and Stanford. DSpace presumably
wishes to appeal to more than two institutions!

DSpace can go on being an elegant solution to a nonexistent problem,
in which case I believe it is doomed, or it can solve problems that
potential depositors have. Those are its only two choices from where
I'm sitting. Continuing to proclaim "problems that people actually
have are out of scope!" is not a viable option. The agreed-upon scope
has heretofore been hopelessly misdefined. This is not DSpace
developers' fault, I hasten to say; developers didn't come up with the
open-access "build it and they will come" ideology which has foundered
on the rock of faculty apathy.

This has led to the vast majority of DSpace repositories in the United
States becoming white elephants. (Wikipedia definition of a white
elephant: "a valuable possession which its owner cannot dispose of and
whose cost [particularly cost of upkeep] exceeds its usefulness."
Right on, Wikipedia. Right on.) I'm sorry if that's unwelcome news.
It's my daily reality. My career and the repository's continuance are
riding on me being able to turn that around, and frankly, the odds are
not presently in my favor -- and I own a lot of that, I willingly
grant, but DSpace owns some of it too.

It's worth noting that solving some of the above problems would create
fertile ground for acquiring appropriate versions of the eventual
published literature based on the research projects served. Even those
who are unilaterally committed to open access should support an
expansion of DSpace's problem-space, because open access gained as a
byproduct of other solved problems *is still open access*.

In short, I believe DSpace could do far worse than take on "solving
depositors' data- and publication-management problems" as its new
scope, since remaining committed to the old one will mire DSpace in
irrelevance.

So. There are a few relatively easy changes that would help me a great
deal in answering some of the above challenges:

- True dark archiving: fix the OAI-PMH hole, please! Some collection
owners do not *want* or *cannot legally leave* collection metadata
hanging in the breeze. They need to have the option of hiding it
*completely*, or they walk away from the repository.
- Embargoes.
- Bitstream-less items.

Somewhat more difficult fixes that would have great impact:

- File versioning.

- User- and depositor-facing usage reports, as discussed last week.

- Elimination of per-item licensing, replaced with a single Terms of
Service click-through. (I can elaborate on this if desired.)

- Streamlining and simplification of the deposit process, including
accepting incomplete deposits (even just a file!) for later
inspection/revision/management by a third party.

- Better display options for a broader variety of content. I need a
page-turner, an image browser, and a journal browser that behaves like
a journal browser -- and that's just for the content I *currently
have*, not the content I foresee wanting to cope with in future.

- Easier machine-to-machine deposit. SWORD is good, but frankly, it's
too hard or out-of-reach for most of the data sources I can imagine. I
need DSpace to deal with crappy RSS feeds, because crappy RSS feeds
are what by-author searches of literature databases can produce, and
local IT folks can usually hack together crappy RSS feeds. I also need
DSpace to cope with watch folders, because "put it here on the server
and I'll deal with it" is a value proposition I can sell. So is
"DSpace will watch your page and ingest any new issues of your
publication automagically." So is "DSpace can serve as the
preservation datastore for your
OJS/OCS/Omeka/ContentDM/Greenstone/Kete/whatever installation."

- Better hooks for transcluding metadata in other contexts. I want
one- or no-click publication histories by author, in HTML and RTF at a
minimum. I want prettily-formatted, logically-organized lists of
publications in a given collection via a single line of Javascript. I
want Researcher Pages. (One of the campuses I serve is seriously
threatening to defect to BePress because of the Selected Works
feature. This is what I mean when I say that value propositions for
depositors are *not optional*, *not frills*, in DSpace.) I want COinS
and RefWorks export.

I do believe that technological integration with Fedora Commons will
go a considerable distance toward escaping the shackles bolted on by
DSpace's too-narrow conception of its mission, and I wholeheartedly
endorse motion in that direction.

Whew. Sorry about this; it's a bit of a broadside. All I can say in my
own defense is that I wouldn't bother if I didn't care as deeply as I
do. I whinge because I love!

Repository managers: If any of this rings a bell with you, I need you
to stand up and say so publicly. "The lurkers support me in email"
(see <http://www.collectableboard.com/forums/books/44988-hoppys-poisoned-sanctimony.html>)
is no more going to get these problems solved in future than it has in
the past.

Dorothea



More information about the Dspace-general mailing list