[Dspace-general] Week 3: Good Repository Software

Simon Brown stb28 at cam.ac.uk
Thu Sep 4 12:49:04 EDT 2008


On 4 Sep 2008, at 16:46, Tom De Mulder wrote:

> On Tue, 2 Sep 2008, Dorothea Salo wrote:
>
>> Repository managers: If any of this rings a bell with you, I need you
>> to stand up and say so publicly. "The lurkers support me in email"
>> (see <http://www.collectableboard.com/forums/books/44988-hoppys-poisoned-sanctimony.html 
>> >)
>> is no more going to get these problems solved in future than it has  
>> in
>> the past.
>
> While I'm not a repository manager, I've looked after a big DSpace
> instance for over 5 years, and I've worn that hat. I agree with  
> Dorothea.
>
> We have serious issues with scalability and stability, and the fact  
> that
> the existing codebase is very hard to modify (but I'll leave it to my
> colleague who has to do the development to elaborate).

That would be me.

I cannot speak to the 1.5 codebase but from what I've seen of it so  
far I don't think there have been many sweeping changes, so most of  
this probably applies. I refer specifically to 1.4.2.

It's a BigBallOfMud. The boundaries between architectural layers vary  
from blurred to nonexistent - for example, there is SQL code scattered  
throughout the codebase, rather than down in an database access layer  
where it should be. This has several unpleasant effects, the first of  
which is that if you plan on running on a database other than Postgres  
or Oracle, you have to hunt down every single piece of SQL throughout  
the entire codebase and add another "else if" to it. Better hope you  
get them all. Supposing that you do, and you want to release your  
additional database support as a patch to assist the community at  
large, you've got a monster patch touching a large number of files in  
the codebase rather than one or two additional classes whose presence  
won't affect anyone who doesn't use them. That's not good design.

It also manifests itself in other ways. Patching the system for  
properly darkening items is, as Dorothea has already noted, fraught  
with potential failures. We have a dark items patch which hides items  
from browse, RSS, and OAI-PMH, and we *think* we've caught everything,  
but as the only way to do it in the codebase as it stands is - once  
again - hunt down every instance of access to items and patch in an  
authz check, we're still not completely certain. We patched OAI-PMH in  
something of a hurry not long ago when we realised metadata was  
leaking through it.

This kind of access control should, once again, be applied at a very  
low level - any calls to get lists of items for browsing etc. should  
include the user access context and shouldn't even return items the  
user should not be able to see. This kind of thing is difficult enough  
to implement on a well-defined architecture and an unholy nightmare on  
a bad one.

Now, in a way, I can understand why fixing these things hasn't been  
high on anyone's list - my institution has things it would rather I do  
with our system in the same way that anyone else's does. What I'm less  
sure of is why a better architecture (which would benefit everyone who  
works with the codebase and therefore, indirectly, everyone else who  
uses DSpace) hasn't been more of a priority for the federation.

I don't really want to address what makes an institutional repository  
good or bad because it's really not my area of expertise; I do feel  
that addressing the quality of the code itself will make it much  
easier for everyone who uses DSpace to bend it towards their  
particular needs.

Regards,
--
Simon Brown <stb28 at cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH





More information about the Dspace-general mailing list