[Dspace-general] [Fwd: Preserving structured collections - major DSpace change for Collection object]

Tellier, Stephane stephane.tellier at cgi.com
Wed Feb 28 09:42:40 EST 2007


Hi,
 
we have considered the METS metadata representation, along with its structure features, which is something that I think seems to look like EAD. We think that METS can resolve almost all of our problems, but we didn't find any open source engine like DSpace that is built on METS.
 
One of our possible solutions is to build our own METS engine that will be in "front" of DSpace, beeing the entry for any kind of search request or OAI-PMH request. Since METS is a structured representation of collections using XML files "pointing" to each other, forming a kind of hierarchical tree, the idea here would be to use the "xpath" functionnality to do the search in the XML files and then having some pointers in those files possessing handle URLs linked to items (like PDFs) contained by DSpace. I know this sounds complicated and its not my favorite solution.
 
Another solution would consists of modifying the Item object in DSpace so that it would be possible to create folders and sub-folders in that Item (unless it's already possible, maybe I didn't saw it yet!). This can be very interesting since the metadatas are related to Items, so it will be possible for us to build a "structure" for our periodicals, although it would certainly need another change so that a search result would be able to display in which of the Item's documents the text was found (including the folders path of the document).
 
We also consider the possibility to insert all of the PDFs in the same Item, but with an additional XML file that would contains the structure information. In the DSpace user interface, that additional file would be treated as a "special link" pointing to an index HTML file displaying the structure of the documents.
 
My first email has discussed about another possible solution about modifying the Collection object in DSpace...
 
That's all for now, but I'm sure that our team will surely come up with other ways. We hope to find someting that will prevent us to do major and complex changes to the DSpace tool, unless its something that can be interesting for other users.

________________________________

From: joseph greene [mailto:joseph.greene at ucd.ie]
Sent: Wed 28/02/2007 5:07 AM
To: Tellier, Stephane
Subject: Re: [Fwd: [Dspace-general] Preserving structured collections - major DSpace change for Collection object]



Hello Stephane,

Have you considered using EAD as a metadata representation of your
collections? It may be a slightly unorthodox use of EAD but it is a
metadata structure that seems to fit your sub-collections very nicely.
It allows the xlink language in most tags. It may not solve your D-space
issues, but could help from a browsing point of view at least.

On that note, EAD can also have subject headings embedded in it at any
level within the hierarchy, top to bottom. Lucene could do a good job of
searching this, along with a parser, which could help you reduce
replication of data.

Interestingly enough, our project has tentatively decided to include
subject headings from each collection and 'sub-collection' (in our case,
'subseries') in every item belonging to that series to do our item level
searching -- it sounds alot like your research.

I look forward to further discussion on this topic on the list.

Joseph Greene

Irish Virtual Research Library and Archive
http://www.ucd.ie/ivrla


----- Original Message -----
From: John McDonough <john.mcdonough at ucd.ie>
Date: Tuesday, February 27, 2007 2:41 pm
Subject: [Fwd: [Dspace-general] Preserving structured collections -
major DSpace    change for Collection object]
To: joseph greene <joseph.greene at ucd.ie>, Adele <adele.cocchiglia at ucd.ie>

> FYI!
>
> J
>
> -------- Original Message --------
> Subject:      [Dspace-general] Preserving structured collections -
> major
> DSpace change for Collection object
> Date:         Tue, 27 Feb 2007 09:11:47 -0500
> From:         Tellier, Stephane <stephane.tellier at cgi.com>
> To:   DSpace-tech at lists.sourceforge.net
> CC:   dspace-general at mit.edu
>
>
>
> Hi all,
>
> In our project in which we have to implement a DSpace solution,
> we're
> actually facing a major problem that might maybe concerns other
> people
> working in librairies.
> We need to submit and preserve periodicals in DSpace in a
> structuralized
> form. Example :
>
> Times magazine
>            |_____________1990
>                                       |__________jan.pdf
>                                       |__________feb.pdf
>                                       |__________...
>                                       |__________dec.pdf
>            |_____________1991
>                                       |__________...
>            |_____________...
>            |_____________2006
>                                       |__________...
>
> In our library, the main database for metadatas is a catalog. An
> item
> can contains a "note" in this catalog and this note possess some
> descriptive metadatas.
> In the example above, the Times magazine collection, while
> containing
> many pdf items, would possess only 1 note in our catalog. That
> means,
> after the transfer from the catalog to DSpace, that the DSpace
> Collection representing the magazine should be ideally the only
> object
> that should contains the metadatas, because we don't want to repeat
> those metadatas for each of the DSpace Items possessing the pdf
> files in
> the whole Collection. This is for performance reason because we
> have
> some collections possessing thousands of pdfs (like a newspaper of
> more
> than 100 years old and having a pdf for each day).
>
> For our team, that means we are actually considering the solution
> of
> making a big change to DSpace so that :
> 1) a collection can have sub-collections (same idea here as
> Communities);2) a collection can be mapped to the metadatas schema
> and therefore be
> considered as an "Item", so that its metadatas would be indexed in
> the
> same way. The collection would then be searchable through the dc
> fields
> (for example). In that case, if we make a search and it gives one
> of
> the item, possessing the pdf,  as a result (full-text indexed pdf),
> we
> would get the dc metadatas from its "parent" collection, instead
> having
> those in the item's record.
>
> As any people here have the same needs and has begin some works
> about
> it? We consider that this can be a very useful add-on for DSpace,
> resolving almost any kind of digital collections. However, we know
> that
> this will not be a simple modification...
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20070228/b1f094b0/attachment.htm


More information about the Dspace-general mailing list