[Dspace-general] Week 3: Good Repository Software

Fri Sep 12 15:24:35 EDT 2008

Dorothea,

I'll take this as an opportunity to summarize what I know about  
various projects that are already happening in the community to  
address these topics.

On Sep 12, 2008, at 8:48 AM, Dorothea Salo wrote:

> Apologies for the lateness. We're finally getting 1.5 in shape to roll
> out, and I'm just a little stressed about that.

You should certainly be voicing any issues/concerns you may have with  
the community. We are glad to advise on best approaches for rolling  
out a 1.5 release (As you know, at MIT we just finished this rollout).

>
> REPOSITORY MIGRATION
>
> Work is underway to enable wholesale repository migration between
> platforms via OAI-ORE. The winning entry in the hackfest at Open
> Repositories 2008 nearly managed an entire migration between DSpace
> and Fedora.

I think the Fedora and DSpace communities are actually looking for  
something even more synergistic than "Migration" out of the ORE and  
other Projects.  We did just finish a project in the GSoC to address  
the subject of using a Fedora repository as the storage layer for the  
DSpace Assetstore.

http://wiki.dspace.org/index.php/ 
Google_Summer_of_Code_2008_Fedora_Integration

There is certainly a strong push between both DSpace and Fedora  
communities to become more involved and collaborative with each-other.

With the 2.0 work there is an opportunity to see even more reuse of  
storage between DSpace and Fedora. Allowing all DSpaceObject data  
(including Policies and permissions to be mapped to Fedora under the  
hood.  Likewise, the 2.0 model rework is seeking to allow metadata to  
be attached to any DSpaceObject, this opens the door for a richer  
expression of DSpace Objects that fits better with both the ORE and  
Fedora storage models.

While hackfests are interesting opportunities to explore ideas,  
without the resulting code being publicized and brought into the  
community, I'm wary of the outcome being anything more than just an  
example of the the point we all know: Given that we are storing  
similar content, we also ultimately have similar use-cases and  
underlying implementation strategies.

I see this as not much different than Scott Yeadons work to present  
Content Interchange at OR2007. The only (albeit big) benefit being a  
third party expression/mapping of both tools to ORE rather than METS.

> #  Content Interchange and the Invisible Repository
> Scott Yeadon
>
> The Australian National University (ANU) will be undertaking  
> development work for the Australian Partnership for Sustainable  
> Repositories (APSR) in 2007. Much of this work will be focused  
> around repository interoperability and the integration of a  
> repository service within the university’s application  
> infrastructure. This presentation will discuss and demonstrate some  
> of the prototype DSpace-related development work undertaken so far  
> and planned for further development in 2007. Specifically: a METS  
> SIP/DIP profile intended to be used as a national standard for the  
> meaningful exchange of digital objects between repositories;  
> separation of concerns at a functional level so an institution can  
> select best-of-breed software, with an example using Open Journal  
> Systems (OJS) to manage publication workflow, DSpace to manage  
> preservation and Manakin as an access/publication point; and a  
> Manakin theme incorporating Google Earth and Google Maps  
> functionality.
>

...

> ONE CHANGE
>
> Asked what the one change would be that would advance DSpace furthest
> toward the ideal repository system, these possibilities came up (in
> rough rank by interest):

Thank you for instigating these responses Dorothea, I would like to  
Make some comments.

Firstly, I would add that aligning DSpace in terms of Model and  
capability with lower level storage solutions (Like Fedora) is one of  
the important requirements in the 2.0 development roadmap and that  
this level of integration is of a very high priority to the  
Foundation and the Core 2.0 development team.  Anyone who has  
questions about how/what is being planned for 2.0 should voice them  
to the team and the community at large.  We are working to solidify  
the architectural prototype and bring together these designs. Once we  
have a tangible body of work,  we will be opening the effort to  
review by the community at large.

That said, I know we also have the following projects already in the  
works in the community:

> * file versioning

I mentored a GSoC project done in 2007 that address this, our  
intention is to merge it into the trunk at some point inthe near future
http://wiki.dspace.org/index.php/Google_Summer_of_Code_2007_Versioning

Likewise we have a project within the MIT Libraries initially  
prototyped by Larry Stone to implement the new History system for DSpace
http://wiki.dspace.org/index.php/HistorySystemPrototype

> * embargoes

Elliot Metsinger has been hard at work on a prototype for handling  
embargoes that is a fork of the 1.5.x codebase, he is also working on  
porting this to the trunk.

> * eliminating the need for server restarts

Not sure about this topic? ITs not necessarily a "feature" of DSpace  
as much as how one deploys it in a production environment.

> * authority control

Authority control is an important topic and one that I've had some  
ideas about, but I haven't seen come to fruition yet.  I'll say that  
theses project however relate to it.

Bitstream Format Renovation (initially prototyped by Larry Stone):
http://wiki.dspace.org/index.php/BitstreamFormat_Renovation

This is a critical project that MIT Libraries has bee working on over  
the last year. It basically provides an Authority Control over the  
Bitstream Formats that will enable DSpace to use services like Pronom  
and GDFR to supply more uniform and consistent Format detection and  
control.  By using a Global Format Registry, DSpace isntance can all  
share a common set of known Formats that will be updated on a regular  
basis to reflect the changes in digital media that occur over time.

In the DSpace DAO refactoring work that James Rutherford, Richard  
Jones, Graham Triggs and myself participated in (which now resides in  
the DSpace trunk). There was a critical effort to refactor out the  
ability to assign and manage External Identifiers such as Handles,  
DOIs, PURLs, Arks, etc

I'm of the opinion that what we are seeking is ultimately a more  
universal solution here. ExternalIdentifiers, BitstreamFormats, are  
really a cases of controlled Metadata fields. An that these fields  
are backed by services with the following levels of capability

Level 1

Read - The ability to list, search or validate a specific metadata  
value (Literal string or Identifier) within an external service.

Level 2

Write - The ability to mint new metadata value (Literal string or  
Identifier) within such a service

We already see such "endpoints" evolving at the LoC and other leaders  
in the field of metadata standardization and classification.

> * API to the repository layer

We have an API to the repository layer. Its called "DSpace API". I'm  
not sure what is meant otherwise? If you are referring to a pluggable  
layer that will allow one to implement ones own Bitstream Assetstore  
solution, this is currently a project that Richard Rodgers has been  
spearheading at MIT Libraries and there are prototype API available,  
this is the intention of seeing it become part of DSpace shortly, the  
only question seems to be which version. If my understanding is  
correct, this api was also used as a critical enhancement to DSpace  
to support integration work with Fedora in the DSPace/Fedora GSoC  
project.

> * multiple instances of DSpace run from a single codebase

The Maven build system allows one to setup numerous separate  
configurations that reuse the same DSpace codebase across them all.  
Again, this is insufficiently vague. Are you referring to  sharing  
jars in a tomcat server instance or JEE container? This is something  
that Graham Triggs has been working on cleaning up the codebase  
improve that capability of.

> * componentization of DSpace

DSpace 1.5.x was our first major reorganization to support  
componentization of DSpace, MAven allows you to write separate module  
projects for DSpace and include them into your build process. This  
allows not only the separation of your customizations from the  
original core codebase, but also the previous statement concerning  
multiple instances.

>
> EMBARGOES
>
> Bram Luyten shared this video of their DSpace embargo function:
> <http://screencast.com/t/hinfBuq3fU> Elliot Metsger shared
> <http://wiki.dspace.org/index.php/User:Emetsger:Embargo>, and its FAQ
> at <http://maven.mse.jhu.edu/embargo/faq.html>
>
> A more nuanced implementation of OAI-PMH would be helpful to several
> chatters. There was general agreement that withdrawn and embargoed
> items should not export metadata via OAI-PMH. The ability to have
> OAI-PMH only disseminate items designated as "full-text" (or otherwise
> complete) was also desired.
>
> Embargoed items should not come up in browses or searches, of course,
> nor should they be crawlable by search engines. However, some items
> can be halfway-private: metadata can be available (including via
> OAI-PMH), but the files should not be downloadable. Access Control
> Lists were raised as one potential solution.

Those are certainly requirements of the Embargo project the Elliot  
Metsger has been working very hard at. I think the Core developers  
embrace that unanimously as an important feature and the exposure as  
serious issue that needs fixing.

I hope this summary of known projects assists the community in  
understanding where work is currently going on and what the overall  
"tack" is our communities informal development roadmap. I do think  
this discussion has been very fruitful and allows a platform for the  
developers within the community to clarify the work that they are doing.

Sincerely,
Mark Diggory

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage