[Dspace-general] Week 4: Bitstream types (Dorothea Salo)

Jim Ottaviani hellpop at umich.edu
Mon Sep 8 12:28:39 EDT 2008


> This week's question has to do with bitstreams. DSpace is designed
> around discrete papers contained within single bitstreams, and it also
> handles websites reasonably well. The question is: what else do you
> have, what have you done with/to DSpace to accommodate it, and what
> else do you need from DSpace?

Audio and Video

As others have said, audio and video are challenges because of file size,
preservation standards (or lack thereof), and typical user needs. We have
deposited enough video, and enough interest in preserving and presenting it,
to see a few things repeat:

(1) Deposit almost always has to be mediated, since users either don't have
the tools to convert what they have into something they can deposit, or
don't want to wait while a very large file transfers via the web form.

(2) If we didn't have a partnership with a streaming service on campus,
people wouldn't deposit audio and video in Deep Blue. Full stop.
Preservation is a secondary concern for most -- they want to be sure others
have access, and they don't have the server space to provide it. And access
via "download the whole thing to the desktop" doesn't work for two reasons:
it's too slow and depending on the encoding the downloaded file might not
work anyway. We're measured against the convenience and usability of
YouTube...and yes, that's not fair, and no, pointing out that we're more
reliable than YouTube doesn't matter. (See above re. preservation as a
secondary concern.)

(3) Depositors often don't have the option -- or don't know how -- to choose
how they capture video, so you get what they produce and live with it. We've
created best practices for creating high quality text, image, and audio
files, but we're stuck on defining what a best practice would be for digital
video. If others have settled on best practices I'd love to hear how they've
decided to define preservation quality video. Not that users will deliver
preservation quality even if you tell them what it is, but it would be nice
to know what we mean by it.


Websites and XML-based complex objects

In the realm of complex objects, those wrapped in HTML and XML are easy
enough to preserve and present, though I disagree about handling them
reasonably well.

Once they're in DSpace all's well, but getting them in is tedious (or more
accurately, incredibly tedious). Handling (a) nested directory structures
without having to "flatten out" a website completely by rewriting internal
links and renaming files (b) being able to upload directories, nested or
not, would be fantastic features to have.


Complex Objects

We get relatively few requests for things like lecture objects and things
that require complex interactions between files, but when we do I'm usually
able to help their producers understand how platform/operating
system/software specific objects are inherently difficult to preserve in the
long term. (A few examples of ubiquitous but now dead programs and companies
usually suffice. Heck, just reminding people of how you can lose
functionality between one version of PowerPoint to the next will usually
suffice.) So I'm not too worried that DSpace doesn't handle these things
seamlessly, since most of my depositors don't expect it to, yet.


Bitstreams in general

I think DSpace's biggest weakness is its trusting nature: If the depositor
says it's a PDF, DSpace believes it. We've just begun to look into how we
might go one small step further by using JHOVE to at least identify the
format. If the depositor says it's a PDF, does that appear to be true? Never
mind validating and characterizing, at least for now.

Sticking with PDF, the next step would be to differentiate between PDF and
PDF/A...though as mentioned above, for those of us that embrace an
unmediated, self-deposit mode, we can lead our users to best practices but
we can't make them use them. So we'd have to think hard about whether we'd
want to reject a sub-optimal (but still understandable and usable) file, or
even alert depositors to it.


____________________________________
Jim Ottaviani
+1 734-763-4835
Coordinator, Deep Blue
http://deepblue.lib.umich.edu
University of Michigan Library

       Quis custodiet ipsos custodes
          --Juvenal, Satires VI, 347





More information about the Dspace-general mailing list