[Dspace-general] Data-centric repository

Mark Diggory mdiggory at MIT.EDU
Wed Jan 31 11:17:30 EST 2007


Hilmar, It was great to meet you in san Antonio.

I'd just like to let you know that we've been looking occasionally  
into the subject of storing data in DSpace and I'd like to clarify  
what my viewpoint on the subject. Please read on below:

On Jan 27, 2007, at 5:13 PM, Hilmar Lapp wrote:

> Hi -
>
> since this is my first posting to this list, I'll briefly introduce
> myself. I run the Informatics at the National Evolutionary Synthesis
> Center, NESCent (http://www.nescent.org), which is an NSF funded
> center located in Durham, NC. Among many other things, we are tasked
> with establishing a digital repository for data in evolution (and at
> first we will focus on published data). A synopsis of what we are
> planning and some of the driving motivations are at http://
> driade.nescent.org.
>
> I've just returned from OR2007, which really left me in awe at the
> work and the expertise of the people involved in DSpace that I met
> there.
>
> The repository we are aiming to establish is primarily going to be
> for digital data, not documents (more specifically, data associated
> with publications). In recollection, most or all of the DSpace-based
> repositories that were presented were concerned with documents, not
> data (or so it appeared to me).

DSpace is not document centric, but it is "file centric" in that 99%  
of the time, what gets placed into DSpace are files (in the  
traditional sense of the word), in DSpace these are stored as  
"Bitstreams".

> I was wondering to what extent this is a false observation, and
> whether or not this contains a message. Based on a superficial look
> at the DSpace data model there seems to be no restriction or bias as
> to what the digital object may be.

Yes, DSpace is file format agnostic, there is nothing that enforces  
what the contents of a file must contain in DSpace. So no matter if  
you have an Excel spreadsheet or a PDF document, DSpace can store and  
retrieve the content as such.

> I learned from one of you that in
> the DSpace data model I have no way to represent hierarchical and
> typed relationships between the individual bitstreams that constitute
> the parts of a 'dataset', which I found a very helpful comment.

Well...Yes... and No...

An "Item" in the DSpace Content Model currently allows for rich  
metadata to be attached directly to it. As well, Bitstreams too,  
allow for a small amount of metadata to be attached (name,  
description, format).  We feel these will get expanded in the future  
so that arbitrary metadata can be attached at any level in the Data  
Model (Community, colelction, Item, or Bitstream). But, often IMHO  
the mistake made is to attempt to map the structure ones data onto  
the content storage model of the repository (at the Collection /  
Item / Bundle structure of DSpace). The complexity of relational  
datasets doesn't map very will into these cases and in fact, I think  
they are the wrong domains to be trying to map together (one being  
content centric, the other being management/relationship centric). A  
relational db conflates the two together, but for good reason because  
it was designed to manage the two together. DSpace on the other hand,  
keeps the two separate so that it can be content agnostic.

DSpace is Access dependent, DSpace only sees files as "Blobs", and  
stores/retrieves them as such. So if you want to store a dynamic  
complex structure like a "Relational Database", the way I see to do  
it is to express that structure in some "serialized" form of one or  
more bitstreams (such as an SQL dump or (a set of csv or tab files,  
something akin to a DDI manifest and associated data files stored in  
separate bitstreams).  The goal being that it can be "reconstituted"  
in the same state by the client downloading a copy of the file (no  
matter if that client is a users browser or some 3rd party or value- 
added service capabile of "remounting" and providing access to a  
version of the database).

What does all this mean technically? At least for me, it involves  
trying to get the groups we are working with to agree on structural  
package that can be used for exchanging datasets. In DSpace I'm  
hoping this will really just be a METS based solution that holds data- 
sets and a manifest like DDI for transporting.

>
> I understand that I need to study that data model in more detail, but
> wanted to ask the community whether there are any examples for DSpace-
> based data-centric digital repositories, and how they fared with  
> DSpace.
>
> Thanks in advance for any comments, links, or other pointers, and BTW
> if anyone has comments or suggestions on our digital data repository
> project as outlined above, please don't hesitate to send those my way
> too, I'll forward them to the project team.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general

Mark R. Diggory
~~~~~~~~~~~~~
DSpace Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology





More information about the Dspace-general mailing list