[Dspace-general] RE: slide collection
Scott Yeadon
scott.yeadon at anu.edu.au
Sun May 16 20:34:48 EDT 2004
Hi Jason,
You are correct in all you say, however if you can control the metadata
creation process, it actually doesn't take a lot of work to get a quick
import process together. We are intending to create a nice batch upload
facility via Web interface, but in the meantime have done the minimum
possible to allow batch uploads as follows:
Have metadata entry done in an Access database or Excel Spreadsheet (or
basically anything from which you can export a tab-delimited file). Set up a
Perl/Java/whatever script that maps each tab delimited field to one or more
DC items. The script creates the directory structure and XML file necessary
to then be able run the DSpace upload in accordance with the DSpace doco.
We've found this is a quick little solution that has so far been adequate for
simple legacy collections. Data quality is the main issue that crops up, but
if your starting from scratch you may able to control that at the source.
I realise that this doesn't help if you have no developers, but the advantage
here is that you don't need a highly skilled programmer to set this up. I can
provide example scripts if that is helpful.
As an example we loaded around 30,000 images this way over a couple of weeks
of part-time uploading, and that time included resolving some data quality
issues and other oddities that were thrown up (as there was no metadata we
had to obtain it from reasonably consistent data directory structures).
If you have a look at the image collections on www.dspace.anu.edu.au all these
were uploaded via the same processes as above, and it's what we've used for
test collections and some other exercises we've been doing.
Scott.
Hello Everyone,
From those of you who are using DSpace in any decent capacity, I would
like to know how you are actually tackling the process of entering
items into the repository. For instance, we are in the process of
creating a digital collection of slides for a campus department. The
process of entering the images into DSpace is laborious (not to mention
the workflow involved with simply digitizing and organizing the
physical slides in the first place), and I cannot think of any
time-saving methods.
Everyone knows that the batch import tools have some issues of
usability and could be improved. In any event, because this is not a
legacy digital collection, none of the images have metadata associated
with them, so the XML files would have to be manually created right
along with the directory structure for the batch import, which
therefore to my mind seems more time-consuming than simply entering
them individually through the DSpace web interface. On this note, how
are people creating compliant XML files for use with the batch
importer, if indeed anyone is doing so? By hand? Specialized
Perl/shell tools? Without some advanced knowledge of XML, programming,
UNIX commands, and related technologies, entering items by this route
is largely impossible, meaning that a highly competent "technology"
person probably must be in charge of entering the data, or at least of
tool creation. Even if a useful script is built that abstracts the
data entering process so that anyone can do it, the end result is a
Perl or similar script that basically mirrors the functionality of the
web interface anyway.
Of course, entering everything by hand through the web interface is an
exceptionally lengthy process, requiring several screens of clicking
and data entry. Even with a fast worker, perhaps only one slide every
minute or so is a good time, and our collection is somewhere around
8,000 images. Without a full-time worker dedicated to only this one
job, the process quickly becomes almost insurmountable in any
reasonable timeframe.
So, how are other institutions managing this troublesome process?
--
Jason Simms
Computer Programming and Design
University of Tennessee, Knoxville
865.974.8508
More information about the Dspace-general
mailing list