[Dspace-general] Archiving research data sets

MacKenzie Smith kenzie at MIT.EDU
Tue Jul 17 11:04:13 EDT 2007


Hi Theo -- I'm very interested to hear about your new DataShare project, 
and you might be similarly interested in some work MIT has done in 
exactly this area, as part of the PLEDGE project with the San Diego 
Supercomputer Center and UNC. Here's a quick summary of the work done so 
far to ingest statistical datasets with DDI encoded metadata into DSpace:

1) OAI-PMH Harvester Agent for Social Science Data Services (i.e. the 
Virtual Data Center http://thedata.org/)

This is a command-line executable agent that interacts with OAI Gateways 
or URLs to generate METS packages containing the resources referenced by 
the document residing at that URL. An implementation has been written 
for the case where the object residing at that URL is a DDI document. In 
the DSpace case, a handler processes the DDI document instance found 
within an OAI /GetRecord/ call and crawls its related resources to 
retrieve all the appropriate content and serialize it into a DSpace SIP 
package capable of being ingested into a DSpace instance.

2) DSpace DDI Packager:

The DDI Packager for DSpace is an implementation of a /DSpace Ingest 
Packager/ that is capable of processing a Package that includes a DDI 
file to produce a /DSpace Item/, with mappings to Dublin Core and 
ingestion of included data files as /Item Bitstreams/. It includes the 
original DDI document as a primary file associated with the /DSpace 
Item/, and the DDI’s URIs are converted to point to the locally archived 
copies for all resources. This allows for the DDI to be used as a 
Manifest, Presentation and Metadata source for DSpace and still 
appropriately point at the content files locally within the DSpace archive.

These two sub-projects are currently in alpha release. Both packages 
have been initially written and have successfully ingested a test DDI 
study, so much of the initial architectural work is complete. The 
project is now entering into a testing and adjustment period to work out 
any major issues with the ingest process during failures in the 
packaging and submission stages to DSpace.

Finally, we've finished a significant amount of the mapping between the 
METS/PREMIS/MODS schemas used in DSpace and DDI. For the VDC harvesting 
in particular there is remaining work to fine-tune the mappings across 
the VDC-centric DDI studies, the VDC system (or technical) metadata 
available via REST calls against VDC resources, and the DSpace SIP Manifest.

More details about this work should be available here 
http://pledge.mit.edu/index.php/Main_Page#Metadata_Crosswalks_for_VDC_Integration
if anyone's interested, and most of this work is being done by Mark 
Diggory if you want details.

Cheers,

MacKenzie

-- 
MacKenzie Smith
Associate Director for Technology
MIT Libraries




More information about the Dspace-general mailing list