[Dspace-general] Filter media and deleted items

Louw Venter Louw.Venter at nwu.ac.za
Tue Nov 3 05:40:34 EST 2009


Hello all,
 
I made a bit of a mess. 
A while back I uploaded some PDF documents to DSpace and ran Filter media to extract the text. Recently the creators of the pdf files sent me a batch with updated volume numbers etc to replace the existing ones already on the server. So I simply removed the items and added new bitstreams.
Now when I run the filter media process again the text doesn't get extracted - could this be because the checksums don't match or because the original was located in one assetstore and the new one in another?
 
Thank you in advance for any help in this regard,
 
 
ERROR filtering, skipping bitstream:
 
        Item Handle: 10394/1886
        Bundle Name: ORIGINAL
        File Size: 287223
        Checksum: 6de2597a7cabd6ca3a995c355d9301f1 (MD5)
        Asset Store: 1
java.lang.NullPointerException
java.lang.NullPointerException
        at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
        at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
        at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
        at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
        at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:141)
        at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:668)
        at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:570)
        at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:520)
        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:488)
        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:427)
        at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:359)
 
 
Louw Venter
 
 

Vrywaringsklousule / Disclaimer: http://www.nwu.ac.za/it/gov-man/disclaimer.html 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20091103/a9a80a7d/attachment.htm


More information about the Dspace-general mailing list