[Dspace-general] Archiving research data sets

Mark Diggory mdiggory at MIT.EDU
Tue Jul 17 11:46:44 EDT 2007


Hello,

At MIT, we've been working on an extension of the PLEDGE project  
focused on archiving social science research data into DSpace from  
the Harvard MIT Data-center VDC/Dataverse service.  The most  
significant aspect of which is a mapping from various sections of the  
DDI standard into METS+PREMIS+MODS and a toolset for DSpace capable  
of importing all relevant study materials (including DDI metadata  
instances, data files and other materials) into DSpace. This toolset  
is based on an "Agent Service" I have authored which activates the  
ingestion process and a "Ingestion Packager" within DSpace that is  
sensitive to DDI metadata available in such SIPs (Submission  
Packages) coming into DSpace from such an "Agent Service".

Our ultimate goal is to make the service available on top of DSpace  
and have it be capable of being activated by Collection Manager or  
Submitters, such that they can archive of specific MIT appropriate  
Studies from Dataverse into DSpace at MIT.

Ideally, we hope that this work will help give rise to the practice  
of "Content Type" centric SIP packages for the various types of Items  
we maintain in DSpace and other IRs (Datasets, Articles, Books,  
Courses, Thesis, Multimedia, etc) and introduce some best practices  
for exchanging such "Content Packages" between Institutional  
repositories.

The Ingest and Dissemination Packager plugin framework is an  
excellent area to explore inter-institutional efforts to produce such  
Content Type centric tools at our various institutions. It is a  
configurable and extensible system for providing such service within  
DSpace.  In the DSpace Developers group, we hope to foster the  
development of AddOn's to DSpace that come from the community of  
institutions that use it as a service.

I have a strong interest in seeing how this subject plays out and  
would enjoy discussing the topic further with everyone.

Cheers,
Mark Diggory


On Jul 17, 2007, at 4:09 AM, tandrew at staffmail.ed.ac.uk wrote:

> Hi,
>
> Regarding deposit of research data sets in institutional  
> repositories, this
> is the topic of a new UK project funded under the JISC Repositories  
> and
> Preservation Programme, called DataShare, in which social science data
> librarians/managers and repository managers will work together to  
> develop
> policies, good practice, and exemplars using DSpace, EPrints and  
> Fedora
> software at four universities: Edinburgh, Southampton, Oxford, and  
> London
> School of Economics.
>
> It's early days for us, but at the University of Edinburgh data  
> librarians
> are working with digital library staff running the Edinburgh Research
> Archive to establish a related DSpace repository for research  
> datasets to
> enhance the existing IR service - and to connect 'orphaned' (non- 
> archived)
> datasets with papers on which they are based in the ERA.
>
> Project staff intend to target early adopter depositors in each of our
> institutions in the area of social science (quantitative) datasets.  
> One of
> our workpackages involves experimenting with domain-specific XML  
> metadata
> (DDI or Data Documementation Initiative) and probably bundling the  
> metadata
> record with the dataset in the same way as described below for  
> environmental
> datasets.
>
> We will be studying social science data archiving guidelines for  
> deposit to
> major social science data archives, e.g. ICPSR,
> http://www.icpsr.umich.edu/access/deposit/guidelines.html
> and the UK Data Archive, http://www.esds.ac.uk/aandp/create/ 
> depintro.asp
> but adapting procedures and forms for each of our institutions and  
> with the
> lighter touch needed for self-archiving.
>
> The project manager would very much like to be in contact with other
> repositories pursuing deposit of research datasets. A brief project
> description page is available here, with a fuller website designed to
> monitor new developments in this arena coming soon.
> http://edina.ac.uk/projects/datashare_summary.html
>
> Best wishes
>
> Theo
>
>
>
> -----Original Message-----
> From: dspace-general-bounces at mit.edu [mailto:dspace-general- 
> bounces at mit.edu]
> On Behalf Of Gail Steinhart
> Sent: 12 July 2007 18:02
> To: dspace-general at mit.edu
> Subject: [Dspace-general] Archiving research data sets
>
> Dear Ina,
>
> We're starting to deposit scientific data sets (just ecological  
> data so far)
> in our institutional repository. It would be interesting to hear  
> what other
> people are doing.
>
> We provide researchers with some recommendations on formatting data -
> derived from various sources, including (among others):
> - Best Practices for Preparing Ecological Data Sets to Share and  
> Archive
> (ORNL-DAAC): http://www.daac.ornl.gov/PI/bestprac.html
> <http://www.daac.ornl.gov/PI/bestprac.html>
> - Ecological data : design, management, and processing. 2000.
> Authors:Michener,William K.; Brunt,James W.
>
> These include fairly common sense recommendations and apply mostly to
> tabular data. We also recommend that researchers save their data to  
> a format
> that is more stable for preservation - tab- or comma-delimited  
> text, for
> example, rather than an Excel spreadsheet. We haven't considered
> recommending XML as a format - I'd be curious if anyone else has  
> given that
> any thought.
>
> We ALSO encourage researchers to deposit data in a domain  
> repository, if one
> exists. If one exists, there may also exist a discipline-specific  
> metadata
> standard that more fully describes the data than DSpace metadata  
> can. In our
> case we recommend ecologists use EML (Ecological Metadata Langauge)  
> and
> deposit the EML record along with the dataset in our DSpace  
> installation -
> as well as depositing data and metadata in the KNB (Knowledge  
> Network for
> Biocomplexity). See for example http://hdl.handle.net/1813/7763
> <http://hdl.handle.net/1813/7763>  for a DSpace submission that has  
> an EML
> record with it.
>
> That's our approach so far - I'd be very interested in hearing what  
> other
> people are doing, or reactions to this.
>
> Best regards,
> Gail
>
>
>
> 	Date: Thu, 12 Jul 2007 12:45:33 +0200
> 	From: "Ina Smith" <Ina.Smith at up.ac.za>
> 	Subject: [Dspace-general] Archiving research data sets
> 	To: dspace-general at mit.edu
> 	Message-ID: <4696226D.1050.00C0.0 at up.ac.za>
> 	Content-Type: text/plain; charset="iso-8859-15"
> 	
> 	Good day
> 	
> 	We would like to start submitting our original research data sets to
> our DSpace institutional research repository (i.e. for digital  
> curation
> purposes). Have any of you out there tried it, or can you perhaps  
> recommend
> best practices in this regard? Examples of how you have implemented  
> it will
> also be very helpful.
> 	
> 	It will be much appreciated! Many thanks in advance.
> 	
> 	Kind regards,
> 	Ina
> 	
> 	
> 	Ina Smith
> 	Digital Research Repository (UPSpace) Manager & eApplication
> Specialist
> 	Academic Information Service
> 	University of Pretoria
> 	Pretoria
> 	0002
> 	South Africa
> 	
> 	Tel.: +27 12 420 3082
> 	Fax: +27 12 362 5100
> 	E-mail: ina.smith at up.ac.za
> 	http://www.ais.up.ac.za <http://www.ais.up.ac.za/>
> 	
>
>
>
> Gail Steinhart
> Research Data & Environmental Sciences Librarian Albert R. Mann  
> Library
> Cornell University Ithaca, NY 14853
>
> Phone: 607-255-7251
> Fax: 607-255-0318
> E-mail: GSS1 at cornell.edu
>
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Office: E25-131
Phone: (617) 253-1096





More information about the Dspace-general mailing list