[Dspace-general] Dspace-general Digest, Vol 55, Issue 8

Larry Stone lcs at MIT.EDU
Wed Feb 13 00:34:15 EST 2008


> The actual storage and management of bitstreams (E.g. asset store design) is
> very important for DSpace. There is a general storage
> abstraction/virtualization in the IT industry taking place at this moment.
> As a result we don't need to worry about file or directory structures
> anymore. We assign Unique Identifiers to objects and describe their
> metadata. The question is whether this object should be a single bitstream
> or a container Archival Information Package.(Based on which standard?) I
> suspect that the AIP discussion might end up with the same problems
> described by Shane about PDF/A's. My feeling is that the most important
> thing is to not carve things in stone and be very flexible and open to
> changes. This probably means managing relations and metadata like DSpace
> does at a higher collection and item levels and make sure that the
> persistence at the more technical bitstream level is guaranteed by storage
> management systems. These systems need to be regularly checked for
> integrity!

Note that the files in the DSpace assetstore have no metadata stored
with them, nor in the inherent structure of the filesystem or filenames
(this has made it possible to adopt various virtualized storage
architectures like the SRB).  Since all of the metadata, even
Bitstream-level metadata, is stored in the RDBMS, the assetstore would
become a random pile of files without it or if it gets out of sync.
Fortunately, that has never happened that I've heard of.

This was addressed by a prototype project that implements true AIPs in
the assetstore and allows a whole archive to be reconstructed from just
the files in the assetstore.  Briefly, it stores a description of each
Item (or Collection, Community) as a METS document in another Bitstream.
The METS structure refers to its member Bitstreams by resolvable
assetstore-URIs so it can find them again.  To rebuild the archive from
files, you just crawl over all Bitstreams looking for AIPs, and
"re-import" each AIP.

For details see
http://wiki.dspace.org/index.php/AipPrototype

> I wonder if these more policy related issues are something to put into a
> DSpace Wiki part?

See http://wiki.dspace.org/index.php/PledgePolicyPrototype

I believe there has been some more progress since that project ended,
especially on the policy-aware storage grid iRODS (see http://irods.sdsc.edu/ ).

    -- Larry




More information about the Dspace-general mailing list