[Dspace-general] DSpace: "digital" archive or "literature" archive?

MacKenzie Smith kenzie at MIT.EDU
Mon Jun 4 09:30:21 EDT 2007


Dear Derek, Richard, et al.

I am occasionally amazed at the degree to which DSpace,
after 5 years as an open source software project, is still talked
about as if it were a vendor product... of course it has limitations...
all software does, especially when it's five years old and
actually getting used. But in this the outcome is in *your* hands.

Based on its cumulative experience, last year some members
of the DSpace technical community produced an architecture
for needed improvements to the system:
http://wiki.dspace.org/index.php/ArchReviewReport

Michele Kimpton, the new Executive Director of the DSpace
Foundation, is now at work on how we can collectively move
that forward. We need to figure out the funding model to get
this work done, but I'm pretty confident it will happen... there
are too many organizations depending on DSpace now to let
it age out of existence, mine included.

And there's always that data export exit strategy if it does --
one of the original requirements of the system that acknowledges
how short the life span of software is these days, and how little
we still know about the "right" way to build these systems.

Cheers,

MacKenzie
MIT Libraries

Derek Hohls wrote:
> Richard
>  
> Thanks for sharing those ideas and thoughts.  
>  
> I looked at the Nuxeo site, and also read through the technical
> comparison
> by Richard Wyles - very interesting.  I also looked the Fedora case
> study
> implementation by Richard Green  [sidebar - there do seem to be lots of
>
> Richards here... is it just a coincidence that my middle name is -
> Richard!]
>  
> In summary, I have gathered that:
>  
> * DSpace is less technically capable, does not scale as well, does not
> handle complex objects or variety of objects, or mass-uploading of
> data, 
> but has an easy and simple front-end for users and administrators.
> There
> is also a wealth of start-up material and a good community.
>  
> * Fedora is more technically capable, scales well (within our likely
> limits
> at least), seems to handle complex objects with a variety of data types
> - MIME- 
> based.  There is no front-end that works on the web; and the Java
> interface
> that is supplied looks absolutely barebones at best.  The concepts and
> ideas
> of Fedora also seem quite complex and are not clearly explained in the
> starting
> documentation.  User docs and tutorials seem minimal.  Community
> support
> is unknown.
>  
> Richard Green's case  study says:
> "Fedora 'out of the box' was a software tool with an associated very
> steep learning 
> curve and a user had to rely heavily on documentation available on the
> Fedora 
> website... we came to realise that the documentation appeared to lack
> some 
> crucial elements and that, for a first time user, it was sometimes not
> easy to follow."
>  
> * Nuxeo might be promising; it has lots of flash but the capabilities
> are harder
> to discern.  The emphasis seems to be on CMS, which is not really what
> we need;
> from their website list of features:
> # Workspaces to create and work on documents
> # Flexible versioning of documents 
> # Document Life Cycle Management 
> # Collaboration features such as comments, on-demand notifications,
> etc.
> # Search / Query interface to the document repository
>  
>  
> This leaves us in a difficult position between two choices; 
> (a) to hold off and hope for Fedora to significantly improve the front
> end 
> and user documentation... which might be  problematic as its not clear
>
> how there funding will continue after September  this year (2007), 
> and there is no project roadmap, so its not that clear as to what they
> will 
> actually focus on.
> (b) to go on with DSpace, and acknowledge that its a temporary
> solution
> which may not adequately address many of our use cases (although still
> a
> step up from holding all research data on local drives or on a DMS).
> if
> we later decide to switch to Fedora, I hope it would be possible to
> extract
> the content out for the new system.  DSpace says:
> http://wiki.dspace.org/index.php//EndUserFaq#Can_I_export_my_digital_material_out_of_DSpace.3F
>
> this is possible....
>  
>  
> Derek
>  
>
>   
>>>> Richard MAHONEY <r.mahoney at iconz.co.nz> 2007/06/01 01:26:42 AM >>>
>>>>         
>
> Dear Derek,
>
> On Fri, 2007-06-01 at 00:20, Derek Hohls wrote:
>   
>> I have recently installed and started looking at DSpace as a
>>     
> "digital"
>   
>> repository.
>>  
>> Background:
>> I work in a science research organisation.  We are clustered into
>> hierarchical groups doing "similar" work, but this structure changes
>>     
> and
>   
>> evolves all the time.  Most of the work we do is in the form of
>> projects.  Each project tackles a particular subject, with a
>>     
> start/end
>   
>> date.  As a result of this, any number of digital "objects" are
>> generated: PDF's, images, presentations, reports, spreadsheets, data
>> files, model runs outputs, program code, spatial files etc. 
>>     
> Usually,
>   
>> such material is archived on CD and kept "somewhere".  
>>  
>> The organisation does run a formal Document Management System (DMS);
>> this is typically used for project reports and has the facilities of
>> document security control, access, version tracking etc.  Its also
>> integrated into other tools we use.  
>>  
>> Problem Statement:
>> I need to provision a system that can be used a complete "digital"
>> archive; that stores *all* digital information in an accessible and
>> easily retrievable manner, with easy uploading/downloading of
>>     
> material
>   
>> into the archive.
>>  
>> Impression of DSpace:
>> My early, high expectations of DSpace have been tempered somewhat as
>>     
> I
>   
>> have started looking at the interface in more detail.  My impression
>>     
> so
>   
>> far is that DSpace seems designed as primarily for occasional storage
>>     
> of
>   
>> literature-type of material, within the framework of a stable
>> organisational framework, whereas I am looking for frequent storage
>>     
> of
>   
>> widely varying material within a shifting organisational framework,
>> accompanied by ongoing staff turnover.
>>  
>> I really would like some input from the existing community -
>>     
> especially
>   
>> those that may have similar experience in this kind of environment,
>> whether or not DSpace is the tool to use.  In particular, some of
>>     
> the
>   
>> worrying limits I have seen so far are ...
>>     
>
> [snip]
>
> I have been using DSpace for over a year now -- 1.3.x and 1.4.x -- on
> Solaris 10 with Sun's Java System Web Server (6.1 and 7.0). I use
> DSpace for Indica et Buddhica - Repositorium: a digital archive
> designed to capture, store, index, preserve, and distribute materials
> pertinent to Indology and South Asian Buddhology. While the aim is to
> build an archive that enables Indologists and Buddhologists to
> catalogue and store a variety of materials -- articles, books, images,
> theses, software, working papers and so on -- the main concern at
> present is to lay the foundation by filling the archive with relevant
> bibliographical records. This is underway and almost 25,000 records
> are
> available already, the same number again should be loaded within the
> next few weeks. More details here:
>
> http://indica-et-buddhica.org/sections/repositorium-preview 
>
> You are at the critical stage of selecting and assessing an archival
> platform so I will try to address your concerns candidly.
>
> I am currently using DSpace only as -- for me -- there is presently no
> suitable alternative. While I was impressed by proven scalability of
> Fedora, the lack of a decent Java web app. admin. and user interface
> ruled it out. (I prefer to avoid PHP apps if possible,  and last time
> I
> tried Fez it consistently crashed Sun's Web Server -- completely
> unacceptable on a test server, let alone in production.) Another suite
> capable of scaling was CDS Invenio (a.k.a. CDSWare). Unfortunately it
> is rather complex to compile, configure and maintain on Solaris so is
> not currently an option. Unfortunately, all that is really left is
> DSpace, with its well known performance and scalability issues.
>
> Although these shortcomings have been raised many times on the mailing
> lists I seen no evidence that they are being addressed with anything
> but lip service. The discouraging findings of this technical
> evaluation, I believe, still hold:
>
> a.) Technical Evaluation of Research Repositories (Richard Wyles
> - 2006-09-14 16:49)
> https://eduforge.org/docman/?group_id=131 
>
>
> >From my own perspective, then, I see DSpace as nothing but a
> temporary
> solution until a good Java web app. is developed for Fedora. Another
> alternative, perhaps more likely in the short term, is Nuxeo's soon to
> be released Java app. Nuxeo 5. I am already using their Zope based CPS
> 4 for the front end of my site and very happy with it. Nuxeo claims
> that CPS 4 has been tested and approved with more than 3TB of live
> data
> (3 million documents). It is intended that version 5 will effortlessly
> scale to over 5TB. This will need to be assessed, but early
> indications
> are convincing. Below are a few references. You may like to note that
> their current development is being driven by the needs of clients
> perhaps not so very different from your own:
>
>
> i.) Nuxeo Home Page:
>
> http://www.nuxeo.com/ 
>
> ii.) CPS Project Page:
>
> http://www.cps-project.org/ 
>
> iii.) About the Zope to Java technology switch (CPS 4 to Nuxeo 5):
>
> http://www.nuxeo.com/en/java-switch/ 
>
> iv.) Nuxeo 5 Project Page:
>
> http://www.nuxeo.com/en/products/ 
>
> http://www.nuxeo.org/static/snapshots/ (Download Daily Snapshots)
>
> v.) Nuxeo 5 Roadmap
>
> http://www.nuxeo.org/sections/about/roadmap/ 
>
>
> vi.) Nuxeo Clients:
>
> http://www.nuxeo.com/en/customers/ 
>
> vii.) Mailing Lists (Nuxeo 5):
>
> http://lists.nuxeo.com/mailman/listinfo/ecm 
>
>
>
> Best regards,
>
> Richard Mahoney
>
>
>   


-- 
MacKenzie Smith
Associate Director for Technology
MIT Libraries




More information about the Dspace-general mailing list