[Dspace-general] Archiving research data sets
MacKenzie Smith
kenzie at MIT.EDU
Tue Jul 17 11:04:13 EDT 2007
Hi Theo -- I'm very interested to hear about your new DataShare project,
and you might be similarly interested in some work MIT has done in
exactly this area, as part of the PLEDGE project with the San Diego
Supercomputer Center and UNC. Here's a quick summary of the work done so
far to ingest statistical datasets with DDI encoded metadata into DSpace:
1) OAI-PMH Harvester Agent for Social Science Data Services (i.e. the
Virtual Data Center http://thedata.org/)
This is a command-line executable agent that interacts with OAI Gateways
or URLs to generate METS packages containing the resources referenced by
the document residing at that URL. An implementation has been written
for the case where the object residing at that URL is a DDI document. In
the DSpace case, a handler processes the DDI document instance found
within an OAI /GetRecord/ call and crawls its related resources to
retrieve all the appropriate content and serialize it into a DSpace SIP
package capable of being ingested into a DSpace instance.
2) DSpace DDI Packager:
The DDI Packager for DSpace is an implementation of a /DSpace Ingest
Packager/ that is capable of processing a Package that includes a DDI
file to produce a /DSpace Item/, with mappings to Dublin Core and
ingestion of included data files as /Item Bitstreams/. It includes the
original DDI document as a primary file associated with the /DSpace
Item/, and the DDI’s URIs are converted to point to the locally archived
copies for all resources. This allows for the DDI to be used as a
Manifest, Presentation and Metadata source for DSpace and still
appropriately point at the content files locally within the DSpace archive.
These two sub-projects are currently in alpha release. Both packages
have been initially written and have successfully ingested a test DDI
study, so much of the initial architectural work is complete. The
project is now entering into a testing and adjustment period to work out
any major issues with the ingest process during failures in the
packaging and submission stages to DSpace.
Finally, we've finished a significant amount of the mapping between the
METS/PREMIS/MODS schemas used in DSpace and DDI. For the VDC harvesting
in particular there is remaining work to fine-tune the mappings across
the VDC-centric DDI studies, the VDC system (or technical) metadata
available via REST calls against VDC resources, and the DSpace SIP Manifest.
More details about this work should be available here
http://pledge.mit.edu/index.php/Main_Page#Metadata_Crosswalks_for_VDC_Integration
if anyone's interested, and most of this work is being done by Mark
Diggory if you want details.
Cheers,
MacKenzie
--
MacKenzie Smith
Associate Director for Technology
MIT Libraries
More information about the Dspace-general
mailing list