[Dspace-general] converting non-searchable PDF's to searchabl e PDF's

Dave Kolka DKolka at netLibrary.com
Wed Aug 18 14:13:07 EDT 2004


William,
Acrobat Capture, as it is found in Acrobat 6.0, can ocr image-based pdfs in
a couple of different ways. One, is to give you what you described a 'true'
pdf or otherwise know as pdf normal.  Two, is to maintain the raster pdf and
add a layer of searchable text within the pdf document.  The problem with
the first option is that you may have to review the 'true' pdf's text for
ocr accuracy.  This generally adds a level of QA to the production process
that is both time and man hour intensive.  Most OCR software will give you
both of these options.

Dave
netLibrary

-----Original Message-----
From: William Simpson [mailto:wsimpson at UDel.Edu] 
Sent: Wednesday, August 18, 2004 11:53 AM
To: dspace-general at mit.edu
Subject: [Dspace-general] converting non-searchable PDF's to searchable
PDF's 

We're uploading some PDFs into DSpace here at the University of Delaware,
and the software used to create the PDF files is an older software (not sure
what application was used), so I found out that the PDFs are not full text
searchable in DSpace (ie, because they are TIFF formatted files). So the
question is, does anyone know whether Adobe Acrobat Capture 3.0, or any
application for that matter, can convert PDF TIFFs to "True" PDFs, or is it
easier to start from scratch, and rescan the original documents?


William Simpson
University of Delaware Library


_______________________________________________
Dspace-general mailing list
Dspace-general at mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general


More information about the Dspace-general mailing list