[Dspace-general] dc.format

MacKenzie Smith kenzie at MIT.EDU
Tue Jul 17 17:06:36 EDT 2007


Also just a footnote to all this that one intention of keeping the 
file's technical metadata,
and especially the format, was as a placeholder for much richer metadata 
in the future.
In particular the ability to link that information up to something like 
the Global Digital
Format Registry (http://hul.harvard.edu/gdfr/) or PRONOM database
(http://www.nationalarchives.gov.uk/pronom/) for definitive information 
on file formats
as an aid to long-term digital preservation.

MIT is working on a project in that direction, to make much better use 
of file-level
metadata for preservation purposes and ultimately integrate DSpace with 
the GDFR
and other handy registries. Larry Stone will be sending an email 
tomorrow to the
tech list about that, so if you're interested, stay tuned...

MacKenzie

Scott Yeadon wrote:
> Just FYI, as of DSpace 1.4.1 format.extent and format.mimetype 
> information is no longer stored at the item level but remains associated 
> with the individual bitstreams. Previously this information was 
> duplicated at bitstream and item level, and as Ingrid pointed out it's 
> pretty much useless at item level since it isn't clear which size and 
> format belongs to which bitstream.
>
> The size information can be useful for showing users the size of files 
> before they download or for analysing content sizes in the repository, 
> maybe also as a marginal check the original bitstream hasn't changed or 
> been tampered with. Whether this needs to be stored as metadata could be 
> debated. Aside from the fact that mime-types aren't all that specific, 
> they could be useful for categorising content types in the repository 
> (for services such as format migration, identifying formats in danger of 
> becoming obsolete) or for identifying particular files when rendering an 
> object via browser or other application.
>
> I'm sure there are other cases where people are using this information 
> as well.
>
> Scott.
>
>   
>> Date: Mon, 16 Jul 2007 14:41:54 +1200
>> From: "Ingrid Mason" <Ingrid.Mason at vuw.ac.nz>
>> Subject: Re: [Dspace-general] dc.format
>> To: "Beth Tillinghast" <betht at hawaii.edu>, <dspace-general at mit.edu>
>> Message-ID:
>> 	<75CF552F30ECFA439D9B3008906F2A37022D4A48 at STAWINCOMAILCL1.staff.vuw.ac.nz>
>> 	
>> Content-Type: text/plain;	charset="us-ascii"
>>
>> Hi Beth,
>>
>> My knowledge of this is scrappy, but here goes.
>>
>> DSpace records in the DBMS the mimetype of the files ingested, so this
>> type of 'data' and metadata is already in the system.  
>>
>> But, one of the good reasons to record dc.format.mimetype is that you
>> can search/sort and know *exactly* what mimetypes you have in the GUI,
>> because if it's in the metadata you can find it easily, for whatever
>> reason.  
>>
>> It starts to get 'interesting' when you have more than 1 object
>> associated with your item record though:  
>>
>> e.g a thesis with a dataset:
>>
>> dc.format.mimetype = application/pdf 
>> dc.format.mimetype = application/octet-stream 
>>
>> It seems obvious that the mimetypes are respectively the thesis and
>> dataset, but who knows?  In some way, it might be important to know
>> which is which, which means indicating this in the metadata.  Maybe add
>> an XML file outlining which file is which?  Plenty of people are fine
>> about just listing them and letting the searcher piece their way through
>> this.  
>>
>> We have chosen not to implement dc.format to avoid this.  We didn't see
>> that that many users would search for file format and the file extension
>> indicates the mimetype in the UI.  It may come back to bite us later on,
>> if for example it is really important for searchers to know what
>> mimetypes are available (for compatibility with software applications).
>> Or, with a view to undertaking preservation interventions/migrations.
>> But, hopefully they would be done via the 'back end' (DBMS) rather than
>> through the metadata anyway.   
>>
>> Hope this helps.  
>>
>> Ingrid 
>>
>> Ingrid Mason
>> Digital Research Repository Coordinator
>> Victoria University of Wellington
>> New Zealand = Aotearoa
>>
>>
>>
>>
>> -----Original Message-----
>> From: dspace-general-bounces at mit.edu
>> [mailto:dspace-general-bounces at mit.edu] On Behalf Of Beth Tillinghast
>> Sent: Saturday, 14 July 2007 12:38 p.m.
>> To: dspace-general at mit.edu
>> Subject: [Dspace-general] dc.format
>>
>> Aloha,
>>
>> I have a question about an element's use in the metadata schema for 
>> our DSpace instance. I notice many, but not all, DSpace instances use 
>> the dc.format.extent and the dc.format.mimetype elements and 
>> qualifiers. I am curious if there is a best-practices reason for this 
>> other than a nice-to-know reason. Can this information be used to 
>> generate certain reports, for example?
>>
>> Thank you in advance for your responses,
>> Beth
>>     


-- 
MacKenzie Smith
Associate Director for Technology
MIT Libraries




More information about the Dspace-general mailing list