[Dspace-general] Week 3: Good Repository Software
Mark Diggory
mdiggory at MIT.EDU
Fri Sep 12 15:24:35 EDT 2008
Dorothea,
I'll take this as an opportunity to summarize what I know about
various projects that are already happening in the community to
address these topics.
On Sep 12, 2008, at 8:48 AM, Dorothea Salo wrote:
> Apologies for the lateness. We're finally getting 1.5 in shape to roll
> out, and I'm just a little stressed about that.
You should certainly be voicing any issues/concerns you may have with
the community. We are glad to advise on best approaches for rolling
out a 1.5 release (As you know, at MIT we just finished this rollout).
>
> REPOSITORY MIGRATION
>
> Work is underway to enable wholesale repository migration between
> platforms via OAI-ORE. The winning entry in the hackfest at Open
> Repositories 2008 nearly managed an entire migration between DSpace
> and Fedora.
I think the Fedora and DSpace communities are actually looking for
something even more synergistic than "Migration" out of the ORE and
other Projects. We did just finish a project in the GSoC to address
the subject of using a Fedora repository as the storage layer for the
DSpace Assetstore.
http://wiki.dspace.org/index.php/
Google_Summer_of_Code_2008_Fedora_Integration
There is certainly a strong push between both DSpace and Fedora
communities to become more involved and collaborative with each-other.
With the 2.0 work there is an opportunity to see even more reuse of
storage between DSpace and Fedora. Allowing all DSpaceObject data
(including Policies and permissions to be mapped to Fedora under the
hood. Likewise, the 2.0 model rework is seeking to allow metadata to
be attached to any DSpaceObject, this opens the door for a richer
expression of DSpace Objects that fits better with both the ORE and
Fedora storage models.
While hackfests are interesting opportunities to explore ideas,
without the resulting code being publicized and brought into the
community, I'm wary of the outcome being anything more than just an
example of the the point we all know: Given that we are storing
similar content, we also ultimately have similar use-cases and
underlying implementation strategies.
I see this as not much different than Scott Yeadons work to present
Content Interchange at OR2007. The only (albeit big) benefit being a
third party expression/mapping of both tools to ORE rather than METS.
> # Content Interchange and the Invisible Repository
> Scott Yeadon
>
> The Australian National University (ANU) will be undertaking
> development work for the Australian Partnership for Sustainable
> Repositories (APSR) in 2007. Much of this work will be focused
> around repository interoperability and the integration of a
> repository service within the university’s application
> infrastructure. This presentation will discuss and demonstrate some
> of the prototype DSpace-related development work undertaken so far
> and planned for further development in 2007. Specifically: a METS
> SIP/DIP profile intended to be used as a national standard for the
> meaningful exchange of digital objects between repositories;
> separation of concerns at a functional level so an institution can
> select best-of-breed software, with an example using Open Journal
> Systems (OJS) to manage publication workflow, DSpace to manage
> preservation and Manakin as an access/publication point; and a
> Manakin theme incorporating Google Earth and Google Maps
> functionality.
>
...
> ONE CHANGE
>
> Asked what the one change would be that would advance DSpace furthest
> toward the ideal repository system, these possibilities came up (in
> rough rank by interest):
Thank you for instigating these responses Dorothea, I would like to
Make some comments.
Firstly, I would add that aligning DSpace in terms of Model and
capability with lower level storage solutions (Like Fedora) is one of
the important requirements in the 2.0 development roadmap and that
this level of integration is of a very high priority to the
Foundation and the Core 2.0 development team. Anyone who has
questions about how/what is being planned for 2.0 should voice them
to the team and the community at large. We are working to solidify
the architectural prototype and bring together these designs. Once we
have a tangible body of work, we will be opening the effort to
review by the community at large.
That said, I know we also have the following projects already in the
works in the community:
> * file versioning
I mentored a GSoC project done in 2007 that address this, our
intention is to merge it into the trunk at some point inthe near future
http://wiki.dspace.org/index.php/Google_Summer_of_Code_2007_Versioning
Likewise we have a project within the MIT Libraries initially
prototyped by Larry Stone to implement the new History system for DSpace
http://wiki.dspace.org/index.php/HistorySystemPrototype
> * embargoes
Elliot Metsinger has been hard at work on a prototype for handling
embargoes that is a fork of the 1.5.x codebase, he is also working on
porting this to the trunk.
> * eliminating the need for server restarts
Not sure about this topic? ITs not necessarily a "feature" of DSpace
as much as how one deploys it in a production environment.
> * authority control
Authority control is an important topic and one that I've had some
ideas about, but I haven't seen come to fruition yet. I'll say that
theses project however relate to it.
Bitstream Format Renovation (initially prototyped by Larry Stone):
http://wiki.dspace.org/index.php/BitstreamFormat_Renovation
This is a critical project that MIT Libraries has bee working on over
the last year. It basically provides an Authority Control over the
Bitstream Formats that will enable DSpace to use services like Pronom
and GDFR to supply more uniform and consistent Format detection and
control. By using a Global Format Registry, DSpace isntance can all
share a common set of known Formats that will be updated on a regular
basis to reflect the changes in digital media that occur over time.
In the DSpace DAO refactoring work that James Rutherford, Richard
Jones, Graham Triggs and myself participated in (which now resides in
the DSpace trunk). There was a critical effort to refactor out the
ability to assign and manage External Identifiers such as Handles,
DOIs, PURLs, Arks, etc
I'm of the opinion that what we are seeking is ultimately a more
universal solution here. ExternalIdentifiers, BitstreamFormats, are
really a cases of controlled Metadata fields. An that these fields
are backed by services with the following levels of capability
Level 1
Read - The ability to list, search or validate a specific metadata
value (Literal string or Identifier) within an external service.
Level 2
Write - The ability to mint new metadata value (Literal string or
Identifier) within such a service
We already see such "endpoints" evolving at the LoC and other leaders
in the field of metadata standardization and classification.
> * API to the repository layer
We have an API to the repository layer. Its called "DSpace API". I'm
not sure what is meant otherwise? If you are referring to a pluggable
layer that will allow one to implement ones own Bitstream Assetstore
solution, this is currently a project that Richard Rodgers has been
spearheading at MIT Libraries and there are prototype API available,
this is the intention of seeing it become part of DSpace shortly, the
only question seems to be which version. If my understanding is
correct, this api was also used as a critical enhancement to DSpace
to support integration work with Fedora in the DSPace/Fedora GSoC
project.
> * multiple instances of DSpace run from a single codebase
The Maven build system allows one to setup numerous separate
configurations that reuse the same DSpace codebase across them all.
Again, this is insufficiently vague. Are you referring to sharing
jars in a tomcat server instance or JEE container? This is something
that Graham Triggs has been working on cleaning up the codebase
improve that capability of.
> * componentization of DSpace
DSpace 1.5.x was our first major reorganization to support
componentization of DSpace, MAven allows you to write separate module
projects for DSpace and include them into your build process. This
allows not only the separation of your customizations from the
original core codebase, but also the previous statement concerning
multiple instances.
>
> EMBARGOES
>
> Bram Luyten shared this video of their DSpace embargo function:
> <http://screencast.com/t/hinfBuq3fU> Elliot Metsger shared
> <http://wiki.dspace.org/index.php/User:Emetsger:Embargo>, and its FAQ
> at <http://maven.mse.jhu.edu/embargo/faq.html>
>
> A more nuanced implementation of OAI-PMH would be helpful to several
> chatters. There was general agreement that withdrawn and embargoed
> items should not export metadata via OAI-PMH. The ability to have
> OAI-PMH only disseminate items designated as "full-text" (or otherwise
> complete) was also desired.
>
> Embargoed items should not come up in browses or searches, of course,
> nor should they be crawlable by search engines. However, some items
> can be halfway-private: metadata can be available (including via
> OAI-PMH), but the files should not be downloadable. Access Control
> Lists were raised as one potential solution.
Those are certainly requirements of the Embargo project the Elliot
Metsger has been working very hard at. I think the Core developers
embrace that unanimously as an important feature and the exposure as
serious issue that needs fixing.
I hope this summary of known projects assists the community in
understanding where work is currently going on and what the overall
"tack" is our communities informal development roadmap. I do think
this discussion has been very fruitful and allows a platform for the
developers within the community to clarify the work that they are doing.
Sincerely,
Mark Diggory
~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage
More information about the Dspace-general
mailing list