[Dspace-general] Wordle visualization of DSpace content

Bram Luyten bram at mire.be
Fri Jul 17 09:00:00 EDT 2009


Hello,

In the category, fun on friday, I was curious to investigate the results of
feeding DSpace item titles into Wordle ( http://www.wordle.net ), and see
what would come up.

Wordle visualizes the occurrence of words for any amount of text you feed
it. Basically Worlde counts the times a specific word occurs, and represents
words that occur many times large, and words that only occur a few times,
smaller, in one resulting picture.

As a data source, I used K.U. Leuven's LIRIAS repository (
http://lirias.kuleuven.be ), a large and rapidly growing repository. This
DSpace's hierarchy is subject oriented, as the communities and collections
are organized according to the institution's organizational structure. For
this experiment, I took three top level communities: the Biomedical Sciences
group, the Humanities and Social Sciences group and last (but not least) the
Sciences, Engineering and Technology group.

Using @mire's reporting suite (
http://atmire.com/USB/resources/reporting_suite.html ) it took me five
minutes to generate a clean list of the item titles of International
Publications (a small subset of the content) for each of these top level
communities, that were submitted in 2009 (500+ for each of these groups).

These lists were used to create following Wordles:
Humanities and Social Sciences -
http://www.wordle.net/gallery/wrdl/1003572/K.U._Leuven_Humanities_and_Social_Sciences_publications_2009
Biomedical Sciences -
http://www.wordle.net/gallery/wrdl/1003562/K.U._Leuven_Biomed_Publications_2009
Science, Engineering and Technology -
http://www.wordle.net/gallery/wrdl/1003577/K.U._Leuven_Science%2C_Engineering_and_Technology_publications_2009

It was funny to see that almost all titles were in english for the Biomed
and SE&T groups. For Humanities and Social Sciences, there was a mix between
english and dutch titles. Wordle allows you to filter the most common words
(the, an, a, ...) for one particular language. So to clean the Humanities &
Social Sciences Worldle from both english and dutch stop-words, I had to do
some manual work on the list.

Although already a sub-selection of three groups was made, you still see a
lot of "generic" scientific terms, and not so many interesting subject
keywords. That's quite logic, because although the scientists belong to the
same group, they're still dealing with a variety of subjects.

When zooming in on more specific subjects, here's the Wordle from the
Computer Science department 2009 publications (one subcommunity level below
the Groups):
http://www.wordle.net/gallery/wrdl/1003647/K.U._Leuven_Computer_Science_publications_2009

And even more specific, here's the one for the researchgroup of Experimental
Radiotherapy, under the Department of Oncology in the group of Biomedical
sciences. For this one, I took all of the publications from 2000-2009 to get
a relevant selection.
http://www.wordle.net/gallery/wrdl/1003638/K.U._Leuven_Experimental_Radiotherapy_Publications_2000-2009

best regards,

Bram Luyten

@mire - http://www.atmire.com

Technologielaan 9 - 3001 Heverlee - Belgium
533 2nd Street - Encinitas, CA 92024 - USA

http://www.togather.eu - Before getting together, get Tog at ther
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20090717/7c10accc/attachment.htm


More information about the Dspace-general mailing list