[Dspace-general] Large-scale DSpace repositories (was Re: Another committer)

James Rutherford james.rutherford at hp.com
Thu Mar 1 06:49:30 EST 2007


Hi Stephane,

On Thu, Feb 22, 2007 at 08:19:52AM -0500, Tellier, Stephane wrote:
> Since you seem to have worked on the China Digital Museum project, I
> was wondering if it could be possible for you to give some
> informations about the hardware specs and the hardware architecture
> (SAN server, load balancing, multiple dspace instances, etc.) about
> that project. If you could send some documentations about it, or refer
> to a web site or wiki explaining these aspects, that would be very
> great.

I haven't been involved with the hardware specification for the data
centres that will be operating, but I could probably get some
information (the estimate is that they will eventually hold ~200TiB of
content each). As for multiple instances, load balancing, etc, myself
and Graham Triggs are looking into clustering mechanisms for DSpace,
both for the database and for the servlet container. If you would like
to contribute to this effort, or read up on what we have found so far, I
suggest you review this page:

http://wiki.dspace.org/HOWTO_Clustering

This page is very much a work in progress; none of the proposed
mechanisms of clustering on that page have been successful yet (though
we are still working on it). For your project, it may be worth
purchasing clustering services from someone like Oracle (I've not listed
that as an option because I wanted to provide information on what can be
done for free).

> Actually in our team, we're trying to implement a DSpace solution for
> a library and we could expect to have needs for a very large number of
> digital documents (over a million could be a possibility), and we are
> asking ourselves what kind of servers and architecture should we used
> for that range.

This is not an easy question to answer, which is presumably why someone
is paying you to answer it ;) Without knowing more detail about the
typical document type, size, etc, it would be difficult to give any
advice on this. That said, no-one is running a DSpace repository with
more than ~200,000 items, so predicting performance and coming up with
an architecture for repositories with >1,000,000 documents is naturally
rather difficult.

cheers,

Jim



More information about the Dspace-general mailing list