[Dspace-general] data sets - metadata

JQ Johnson jqj at darkwing.uoregon.edu
Tue Oct 14 12:15:28 EDT 2003


I'm very interested in this question also.  Note that the format of the data
in computer file or DSPace format registry terms (Excel file, text file,
Access .mdb file, etc.) may be much less relevant than the internal format
of the data (for instance, a text file might be a comma-separated
spreadsheet dataset or an SPSS dataset or the raw SQL commands needed to
reconstitute a SQL database or ...).

Perhaps more important than the raw observational data is the internal
descriptive data that may or may not accompany the data itself -- the
codebook, if you will.  Note that in some data formats this is a separate
file, while in others it may be integrated into the data format.  This is
truly metadata, but it's sure as anything not qDC!

Perhaps equally important is the description of the data collection and data
cleaning technique -- the sort of information that is typically included in
the methods section of a paper based on the raw data.  If we really care
about having data sets useful in the future, then it's important that the
data set include links to such a methods section.  Such information is
highly discipline specific; the appropriate metadata for a gene sequence is
rather different from that for a statistical survey of domestic abuse
victims.

Many data sets are made available as supplements to published research; for
example, many journals such as _Science_ now provide web based repositories
for appendices to articles they publish.  At a bare minimum, the DC-style
metadata for a data set that corresponds to a published paper should include
a citation for the published paper.

Let's start with the qDC.  coverage.* are natural fields to fill in as is
unqualified format.  I'd say that relation.isbasedon and other relation.*
fields are also critical to include.  I'd also say that a policy that
required a human-readable codebook as one of the bitstreams associated with
a dataset (stored as one or more bitstreams in the same item) would be
extremely important.  But then what should the format fields refer to?  They
are item-level, so how do we relate the format.* values to the particular
bitstreams they apply to?  [This seems to be a major weakness in the current
DSpace architecture]

Our observation as we've begun to explore statistical data for our
institutional repository is that most researchers are very careless about
documenting their data in ways that would make it useful to other
researchers in the future.  We believe that the simple repository is only a
tiny fraction of the real issue, and that the important thing is to provide
advice and formal structures that make it easy for researchers to document
their data collection process.

I suspect that trying to answer the question if posed as "data sets" is
going to be a failure, and that we should pose the question in terms of some
particular kind of data set such as statistical sample observations. "Data
set" is so broad that it even includes a set of bibliography entries (in
bibtex, endnote, refer, MARC, or whatever format).  So the first thing to do
is for us to focus on a particular kind of data.

JQ Johnson                      Office: 115F Knight Library
Academic Education Coordinator  mailto:jqj at darkwing.uoregon.edu
1299 University of Oregon       phone: 1-541-346-1746; -3485 fax
Eugene, OR  97403-1299          http://darkwing.uoregon.edu/~jqj/

-----Original Message-----
From: dspace-general-bounces at mit.edu
[mailto:dspace-general-bounces at mit.edu]On Behalf Of Gabriela Mircea
Sent: Tuesday, October 14, 2003 8:16 AM
To: dspace-general at mit.edu
Subject: [Dspace-general] data sets - metadata


Hi all,

We have some data sets that we would like to put into DSpace, but I am
not sure how we should handle the metadata.
Does anyone have data sets in DSpace, and are you willing to share the
way that metadata was organized?
We should probably add some more fields. The problem is not how to add
the fields (technical), but what descriptors should we use for data
sets. Are there any standards?

Thank you in advance,

Gabriela

_______________________________________________
Dspace-general mailing list
Dspace-general at mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general



More information about the Dspace-general mailing list