[Dspace-general] data sets

Tom De Mulder tdm27 at cam.ac.uk
Tue Oct 4 08:03:07 EDT 2005


On Mon, 3 Oct 2005, MacKenzie Smith wrote:

>I also recommend looking at what Cambridge University is doing with small
>molecule data in DSpace at Cambridge...
>https://www.dspace.cam.ac.uk/handle/1810/724. If the data is encoded in
>xml and can be divided into individual small items of data than you can
>do a bit more, and the information is included in search engines like
>Google.

We've actually also got a small set of archaeological data (on-site
measurements of an excavation). The decision there (made by the researcher
himself) was to store these as a simple tab-separated value (text) file
which would lose the functionality of the original but be widely readable
and indexable.

In a similar vein, we have a large selection of horse paleopathology
images, which are in fact part of a larger dataset describing the original
archaeological finds of the horse bones (measurements, location, etc). In
that case, the original data was in a database, and we extracted it
(again) as TSV and asked the researcher for a description of all the
fields.

This description was stored alongside the data, and while the total
doesn't have the immediate functionality of the original database, it is
possible to reconstruct it completely, with the added advantage that it
can be done by anyone who can open plain text files, rather than
requiring specialist database software.

Our approach now is to store data in as many formats as we possibly can,
migrating it just before ingest. In effect, this tends to mean that we
make a tsv copy of the data and store that alongside the original where
possible. We also apply this to more common data formats where we can,
storing different formats alongside one another in the hope that at least
one version will be readable/usable by the eventual consumer.


Kind regards,

-- 
Tom De Mulder <tdm27 at cam.ac.uk> - Cambridge University Computing Service
                   New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 04/10/2005 : The Moon is Waxing Crescent (3% of Full)


More information about the Dspace-general mailing list