[Dspace-general] RE: slide collection

Sun May 16 20:34:48 EDT 2004

Hi Jason,

You are correct in all you say, however if you can control the metadata 
creation process, it actually doesn't take a lot of work to get a quick 
import process together. We are intending to create a nice batch upload 
facility via Web interface, but in the meantime have done the minimum 
possible to allow batch uploads as follows:

Have metadata entry done in an Access database or Excel Spreadsheet (or 
basically anything from which you can export a tab-delimited file). Set up a 
Perl/Java/whatever script that maps each tab delimited field to one or more 
DC items. The script creates the directory structure and XML file necessary 
to then be able run the DSpace upload in accordance with the DSpace doco. 
We've found this is a quick little solution that has so far been adequate for 
simple legacy collections. Data quality is the main issue that crops up, but 
if your starting from scratch you may able to control that at the source.

I realise that this doesn't help if you have no developers, but the advantage 
here is that you don't need a highly skilled programmer to set this up. I can 
provide example scripts if that is helpful.

As an example we loaded around 30,000 images this way over a couple of weeks 
of part-time uploading, and that time included resolving some data quality 
issues and other oddities that were thrown up (as there was no metadata we 
had to obtain it from reasonably consistent data directory structures).

If you have a look at the image collections on www.dspace.anu.edu.au all these 
were uploaded via the same processes as above, and it's what we've used for 
test collections and some other exercises we've been doing.

Scott.

Hello Everyone,

 From those of you who are using DSpace in any decent capacity, I would 
like to know how you are actually tackling the process of entering 
items into the repository.  For instance, we are in the process of 
creating a digital collection of slides for a campus department.  The 
process of entering the images into DSpace is laborious (not to mention 
the workflow involved with simply digitizing and organizing the 
physical slides in the first place), and I cannot think of any 
time-saving methods.

Everyone knows that the batch import tools have some issues of 
usability and could be improved.  In any event, because this is not a 
legacy digital collection, none of the images have metadata associated 
with them, so the XML files would have to be manually created right 
along with the directory structure for the batch import, which 
therefore to my mind seems more time-consuming than simply entering 
them individually through the DSpace web interface.  On this note, how 
are people creating compliant XML files for use with the batch 
importer, if indeed anyone is doing so?  By hand?  Specialized 
Perl/shell tools?  Without some advanced knowledge of XML, programming, 
UNIX commands, and related technologies, entering items by this route 
is largely impossible, meaning that a highly competent "technology" 
person probably must be in charge of entering the data, or at least of 
tool creation.  Even if a useful script is built that abstracts the 
data entering process so that anyone can do it, the end result is a 
Perl or similar script that basically mirrors the functionality of the 
web interface anyway.

Of course, entering everything by hand through the web interface is an 
exceptionally lengthy process, requiring several screens of clicking 
and data entry.  Even with a fast worker, perhaps only one slide every 
minute or so is a good time, and our collection is somewhere around 
8,000 images.  Without a full-time worker dedicated to only this one 
job, the process quickly becomes almost insurmountable in any 
reasonable timeframe.

So, how are other institutions managing this troublesome process?

--
Jason Simms
Computer Programming and Design
University of Tennessee, Knoxville
865.974.8508