[Dspace-general] Adding items without bitstreams

Tue Jul 10 21:10:32 EDT 2007

Hi Halley,

I'm just trawling through old DSpace list messages and saw this. So,
this may be somewhat late in the piece, but, what the heck.  

Just thought I'd let you know that we've digitised some theses (loose
leaf).  By no means the number you are looking to work with, i.e. 50.
We compiled the compressed TIFF images in Adobe and ran an OCR process
over them (the text produced by the OCR is incorporated as part of the
PDF created).  DSpace then does its work and creates a bitstream of it.
This was a manual process.  It would be good (if possible) if someone
wrote you some scripts that so this processing could be done
computationally.  

One thing though, if you do go the PDF route, the PDF file sizes can
come out fairly large (depending on the resolution, etc of the image
file).  Unless you compress those images further before you convert them
to PDF, at around the 20/25MB (file size range) they won't open in a web
browser.  It's a bug in browsers with large PDFs.  

Do let us know what you decide to do.  

Cheers, Ingrid 

-----Original Message-----
From: dspace-general-bounces at mit.edu
[mailto:dspace-general-bounces at mit.edu] On Behalf Of Halley Pacheco de
Oliveira
Sent: Wednesday, 25 April 2007 3:04 a.m.
To: Tiago Ferreira
Cc: dspace-general at mit.edu
Subject: Re: [Dspace-general] Adding items without bitstreams

Hi Tiago,

All projects, with informations about their title, authors, sanction
process and many other informations are stored in a SQL Server
database. These informations will be used as metadata in a
dublin_core.xml  file to import these projects to DSpace. The problem
is that the old projects are only in paper, so they must be scanned
and an image uploaded to DSpace. Once we need full text search for the
projects, maybe we will use an OCR software. It would be good if the
OCR text could be integrated with the image (maybe using PDF) in the
same file, and indexed by DSpace.

Thanks,
Halley

2007/4/24, Tiago Ferreira <ferreira.tiago at gmail.com>:
> Hello Halley,
>
> One idea is to use the Batch importer, wich will transform an XML
metadata
> document with some content files, into an item, as if it was an "in
progress
> submission". That will save you the trouble of submitting one item at
a
> time.
>
> Hope this helps
> Tiago Ferreira
_______________________________________________
Dspace-general mailing list
Dspace-general at mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general