[Dspace-general] Week 3: Good Repository Software
Simon Brown
stb28 at cam.ac.uk
Thu Sep 4 12:49:04 EDT 2008
On 4 Sep 2008, at 16:46, Tom De Mulder wrote:
> On Tue, 2 Sep 2008, Dorothea Salo wrote:
>
>> Repository managers: If any of this rings a bell with you, I need you
>> to stand up and say so publicly. "The lurkers support me in email"
>> (see <http://www.collectableboard.com/forums/books/44988-hoppys-poisoned-sanctimony.html
>> >)
>> is no more going to get these problems solved in future than it has
>> in
>> the past.
>
> While I'm not a repository manager, I've looked after a big DSpace
> instance for over 5 years, and I've worn that hat. I agree with
> Dorothea.
>
> We have serious issues with scalability and stability, and the fact
> that
> the existing codebase is very hard to modify (but I'll leave it to my
> colleague who has to do the development to elaborate).
That would be me.
I cannot speak to the 1.5 codebase but from what I've seen of it so
far I don't think there have been many sweeping changes, so most of
this probably applies. I refer specifically to 1.4.2.
It's a BigBallOfMud. The boundaries between architectural layers vary
from blurred to nonexistent - for example, there is SQL code scattered
throughout the codebase, rather than down in an database access layer
where it should be. This has several unpleasant effects, the first of
which is that if you plan on running on a database other than Postgres
or Oracle, you have to hunt down every single piece of SQL throughout
the entire codebase and add another "else if" to it. Better hope you
get them all. Supposing that you do, and you want to release your
additional database support as a patch to assist the community at
large, you've got a monster patch touching a large number of files in
the codebase rather than one or two additional classes whose presence
won't affect anyone who doesn't use them. That's not good design.
It also manifests itself in other ways. Patching the system for
properly darkening items is, as Dorothea has already noted, fraught
with potential failures. We have a dark items patch which hides items
from browse, RSS, and OAI-PMH, and we *think* we've caught everything,
but as the only way to do it in the codebase as it stands is - once
again - hunt down every instance of access to items and patch in an
authz check, we're still not completely certain. We patched OAI-PMH in
something of a hurry not long ago when we realised metadata was
leaking through it.
This kind of access control should, once again, be applied at a very
low level - any calls to get lists of items for browsing etc. should
include the user access context and shouldn't even return items the
user should not be able to see. This kind of thing is difficult enough
to implement on a well-defined architecture and an unholy nightmare on
a bad one.
Now, in a way, I can understand why fixing these things hasn't been
high on anyone's list - my institution has things it would rather I do
with our system in the same way that anyone else's does. What I'm less
sure of is why a better architecture (which would benefit everyone who
works with the codebase and therefore, indirectly, everyone else who
uses DSpace) hasn't been more of a priority for the federation.
I don't really want to address what makes an institutional repository
good or bad because it's really not my area of expertise; I do feel
that addressing the quality of the code itself will make it much
easier for everyone who uses DSpace to bend it towards their
particular needs.
Regards,
--
Simon Brown <stb28 at cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
More information about the Dspace-general
mailing list