From jfurfey at mbl.edu  Fri Aug  1 14:45:17 2008
From: jfurfey at mbl.edu (John Furfey)
Date: Fri, 1 Aug 2008 14:45:17 -0400
Subject: [Dspace-general] migrate data to 1.5
Message-ID: <F8AD54B3-1D16-4E81-BBA5-698880D9D3D4@mbl.edu>

We're in the process of upgrading from 1.4.2 to 1.5, and we're also  
moving to a new server.

We've got 1.5 up and running and we're trying to figure out the best  
way of migrating our data.  Is it possible to do a pg_dump from the  
1.4.2 server and do a pg_restore on the 1.5 server?  Or will 1.5's  
new db schema prevent this?

Thanks for any response, I have not been able to find any  
documentation for this scenario.

------------------------------------------------------------
John Furfey
Digital Systems and Services Coordinator
MBLWHOI Library
Woods Hole MA  02543 USA
PHONE:  508-289-7435
EMAIL:  jfurfey at mbl.edu
http://www.mblwhoilibrary.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080801/60efe5ad/attachment.htm

From mdiggory at MIT.EDU  Sun Aug  3 00:12:17 2008
From: mdiggory at MIT.EDU (Mark Diggory)
Date: Sat, 2 Aug 2008 21:12:17 -0700
Subject: [Dspace-general] [Dspace-tech] migrate data to 1.5
In-Reply-To: <F8AD54B3-1D16-4E81-BBA5-698880D9D3D4@mbl.edu>
References: <F8AD54B3-1D16-4E81-BBA5-698880D9D3D4@mbl.edu>
Message-ID: <7B27116C-C40A-41D0-9445-11190CB4D2B3@mit.edu>

John,

Please follow the upgrade instructions supplied for upgrading DSpace  
1.4.2 to 1.5 (page 50).  The upgrade process described in the  
documentation will take you through the appropriate steps to convert  
the DSpace postgres database from 1.4.2 to 1.5

http://www.dspace.org/images/onepointfivedocs/dspacemanual_15_may.zip

On Aug 1, 2008, at 11:45 AM, John Furfey wrote:

> We're in the process of upgrading from 1.4.2 to 1.5, and we're also  
> moving to a new server.
>
> We've got 1.5 up and running and we're trying to figure out the  
> best way of migrating our data.  Is it possible to do a pg_dump  
> from the 1.4.2 server and do a pg_restore on the 1.5 server?  Or  
> will 1.5's new db schema prevent this?

You do not want to attempt to do it in this order. The upgrade  
process supplies a SQL script (database_schema_14-15.sql) to make the  
necessary changes to your existing database to upgrade from 1.4.2 to  
1.5, you do not need to do a fresh install of an empty DSpace  
instance and migrate your data into it.  I also highly recommend  
using dp_dump/psql to create a copy of your database and install a  
replica of your dspace installation on another machine to properly  
"test" that the upgrade process will work successfully for your  
product server before attempting it there.  This will also give you  
an opportunity to become familiar with the upgrade process before  
doing it against a mission critical instance.

To backup a postgres database instance on linux the we use the  
following command/options

> pg_dump --oids -U dspace -f dspace-backup.sql [dspace-db-name]


Where [dspace-db-name] is the name of your dspace database in the  
postgres cluster (usually this is "dspace" by default).  To restore  
the backup to the same location,

> psql -U dspace -d [dspace-db-name] < dspace-backup.sql


or to the same name on another machine where you do not already have  
the database or dspace user created, you would do.

> createuser -U postgres -d -A -P dspace
> createdb -U dspace -E UNICODE [dspace-db-name]
> psql -U dspace -d [dspace-db-name] < dspace-backup.sql


> Thanks for any response, I have not been able to find any  
> documentation for this scenario.

Certainly do feel free to post any questions about how to handle your  
upgrade properly with the dspace-tech list. We in the community who  
have worked on creating this upgrade process would like to assure  
your switch to 1.5.0 is a success.

-Mark

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080802/2b4f5aab/attachment.htm

From sunilgoria at yahoo.com  Mon Aug  4 06:00:51 2008
From: sunilgoria at yahoo.com (Sunil Goria)
Date: Mon, 4 Aug 2008 03:00:51 -0700 (PDT)
Subject: [Dspace-general] File upload problem
Message-ID: <701332.35218.qm@web53411.mail.re2.yahoo.com>

Dear all
We are using Dspace 1.2 on on Linux enterprise since last 3-4 years. Now we are unable to upload?file in Dspace server from client throug Internet explorer. Earlier it was warking fine. After re-insttaling broweser it is not uploading file. Though it is uplaoding file from?one system in our LAN. Please suggest me the browser setting or any other reason for this problem.
?
with regards,


Dr. Sunil Goria
Assistant Librarian
University Library,
G.B. Pant University of Agriculture & Technology,
Pantnagar-263145 (India)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080804/c87a92ea/attachment.htm

From alally at u.washington.edu  Mon Aug  4 12:02:47 2008
From: alally at u.washington.edu (Ann Lally)
Date: Mon, 4 Aug 2008 09:02:47 -0700
Subject: [Dspace-general] selective searching
Message-ID: <00b701c8f64b$89ca8470$9d5f8d50$@washington.edu>

Hi all,

The UW Libraries has been storing "library centric" digital files in our
instance of DSpace, as well as some items that are locked by a particular
community.  We don't want these files to show when someone searches for
academic papers and reports.  Has anyone else had this issue?  How did you
resolve it?  

 
Thanks in advance.

 
Ann Lally

University of Washington Libraries

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080804/d3c0c971/attachment.htm

From mdiggory at MIT.EDU  Wed Aug  6 13:19:37 2008
From: mdiggory at MIT.EDU (Mark Diggory)
Date: Wed, 6 Aug 2008 10:19:37 -0700
Subject: [Dspace-general] DSpace@MIT upgrades to DSpace 1.5.x
Message-ID: <2465CAED-77ED-4B95-B8A6-8D94BC5018CA@mit.edu>

Dear DSpace Community,

Earlier this week we completed the upgrade of DSpace at MIT from DSpace  
1.4.1 (JSPUI) to the latest DSpace 1.5.x revision. Likewise, we  
switched over completely to use the Texas A&M Manakin based XMLUI  
during this upgrade process.  The service can now be explored at  
original DSpace at MIT host.

http://dspace.mit.edu

For MIT Libraries, this upgrade represents the culmination of more  
than a years worth of effort done in collaboration with other  
individuals and organizations within the DSpace community, beginning  
with the reorganization of the DSpace 1.4.2 code-base, the  
establishment of the Maven based build process, and culminating in the  
in the release of DSpace 1.5.0, the addition of maintenance fixes and  
the upgrade of our systems.  We at MIT feel that the DSpace 1.5.X  
branch, which now contains a significant load of bug fixes, is now  
prepared for a maintenance release. As release coordinator on the  
1.5.1 release, I expect to now begin testing the release of a 1.5.1  
beta update in the coming week.

I would like to thank all the developers and community members who  
contributed to the DSpace 1.5.X codebase in the past year. We could  
have not have accomplished our own production goals without your  
efforts within the community.

Cheers,
Mark Diggory

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage


From dsalo at library.wisc.edu  Wed Aug  6 16:07:44 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Wed, 6 Aug 2008 15:07:44 -0500
Subject: [Dspace-general] DSpace development priorities: starting a
	discussion
Message-ID: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com>

Greetings, DSpace community,

For some time, I've been concerned that the DSpace development process
hasn't enjoyed as much input from the broader community as would be
desirable. The voices of less-technical repository managers and other
staff associated with DSpace repositories have been particularly
difficult to attract to the discussion. I'm hoping to gather
impressions and suggestions from this specific segment of the
community (though others are welcome as well!) to pass on to DSpace
developers. With any luck, this process will build a stronger
connection between developers and repository managers going forward.

The DSpace development-priority survey done in 2007 was valuable and
worthwhile, and if possible, I'd like to revisit some of the questions
raised there. I'd also like to start "in your own words" discussions
about what repository managers want and need from DSpace that it isn't
yet providing.

We can certainly talk here, and I welcome that! More than one DSpace
developer has agreed to monitor these discussions, and I will be
summarizing them back to the development list. But I'm completely open
to other venues as well -- IM, group chat, Web 2.0, Skype, out-of-band
email -- depending on what people tell me they want.

So. How would you like to do this? Once we've sorted out the process,
we can get down to business. Feel free to contact me off-list if you
prefer.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From val at dspace.org  Wed Aug  6 22:37:53 2008
From: val at dspace.org (Valorie Hollister)
Date: Wed, 06 Aug 2008 21:37:53 -0500
Subject: [Dspace-general] DSpace development priorities: starting
	a	discussion
Message-ID: <20080806213753.wa7jq9m9og884k80@www.dspace.org>

Dorothea -

I believe this effort would be very worthwhile and consistent with  
many of the upcoming initiatives the Foundation has been working on  
(DSpace Global Outreach Cmte, implementation of Jira for feature  
requests/tracking, DSpace repository
manager meeting at SPARC in November, and discussion forums on  
www.dspace.org, etc). I would very much like to be involved in the  
discussions you are suggesting and look forward to hearing from the  
DSpace community.

Valorie Hollister
Community Outreach Manager
DSpace Foundation


Greetings, DSpace community,

For some time, I've been concerned that the DSpace development process
hasn't enjoyed as much input from the broader community as would be
desirable. The voices of less-technical repository managers and other
staff associated with DSpace repositories have been particularly
difficult to attract to the discussion. I'm hoping to gather
impressions and suggestions from this specific segment of the
community (though others are welcome as well!) to pass on to DSpace
developers. With any luck, this process will build a stronger
connection between developers and repository managers going forward.

The DSpace development-priority survey done in 2007 was valuable and
worthwhile, and if possible, I'd like to revisit some of the questions
raised there. I'd also like to start "in your own words" discussions
about what repository managers want and need from DSpace that it isn't
yet providing.

We can certainly talk here, and I welcome that! More than one DSpace
developer has agreed to monitor these discussions, and I will be
summarizing them back to the development list. But I'm completely open
to other venues as well -- IM, group chat, Web 2.0, Skype, out-of-band
email -- depending on what people tell me they want.

So. How would you like to do this? Once we've sorted out the process,
we can get down to business. Feel free to contact me off-list if you
prefer.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493
_______________________________________________
Dspace-general mailing list
Dspace-general at mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general


----- End forwarded message -----


Valorie Hollister
Community Outreach Manager
DSpace Foundation


From mwood at IUPUI.Edu  Thu Aug  7 08:44:42 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Thu, 7 Aug 2008 08:44:42 -0400
Subject: [Dspace-general] DSpace development priorities: starting
	a	discussion
In-Reply-To: <20080806213753.wa7jq9m9og884k80@www.dspace.org>
References: <20080806213753.wa7jq9m9og884k80@www.dspace.org>
Message-ID: <20080807124442.GB2968@IUPUI.Edu>

On Wed, Aug 06, 2008 at 09:37:53PM -0500, Valorie Hollister wrote:
> implementation of Jira for feature requests/tracking

Well, there's a communication opportunity right there.  This is the
first I'd heard of setting up another tracker system.  We already
have trackers full of items at SourceForge.  Will those items be
migrated?

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080807/0b671921/attachment.bin

From christophe.dupriez at destin.be  Thu Aug  7 09:02:25 2008
From: christophe.dupriez at destin.be (Christophe Dupriez)
Date: Thu, 07 Aug 2008 15:02:25 +0200
Subject: [Dspace-general] DSpace development priorities: starting
	a	discussion
In-Reply-To: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com>
References: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com>
Message-ID: <489AF261.70200@destin.be>

Dear Dorothea, (to the DSpace Community)

Thank you so much for your long needed initiative.

I am using DSpace for different customers who are paying me to adapt
DSpace to their needs. I am very lucky to work with those institutions
who trust me and provide me interesting challenges. I am taking great
advantages of the DSpace project and return much too few contributions
to the community. I previously made suggestions to improve this
(conclusion of the following paper)
http://www.aepic.it/conf/viewpaper.php?id=197&cf=11

For me, big institutions, universities, research networks have the
resources (money, people, organisation) to get what they want from
DSpace source code. They can request what they need  from their
developers and may (or not) encourage their developers to take the time
to contribute back to the DSpace community. From my point of view, this
process is not very efficient and is rapidly seen as not profitable
enough by most managers.

If one desires that other institutions (less money, less people, less
organisation skills available) to be able to publish their intellectual
production using DSpace, I believe more **coordinated** efforts should
be allocated to create a "standard" DSpace  flagship that NO developer
have to customize locally. If we remove from our mind that local
institutions can "always" develop their adaptations, we would look at
the project more cautiously and possibly put back the users where they
must be: in the driver seat.

One cultural problem we may have: Open source developers enjoy freedom
and protect it by opposing the "free software principles" to any
criticism. "free software" means "somewhat free from the commercial
empire", not free for all! Top-down processes must be in a right balance
with bottom-up ones.

Establishing generic institutional needs (a DSpace product definition)
must be a structured project to succeed. A project begins when one
identifies:
1) a global need, objective
2) project sponsors who approve important decisions
3) a knowledgeable project leader who animate, coordinate, mandate

The proposal I would like to make:
1) Apply the 80/20 rule: Create an immediatley applicable DSpace package
which answers to 80% of the needs of 80% of the smaller institutions
which would be happy to not hire any developer (and keep their money to
hire a very good application manager) to have an enthusiastic result
**that the DSpace foundation would guarantee to sustain on the long
term, always providing an easy upgrade path from one version to the next**
2) The sponsors should be institutions gathering to provide resources
(money, people, organisation skills) to obtain this result in a
reasonable time frame (18 months). The Foundation would coordinate this
committe, animate the process with a democratic "1 participating
institution = 1 vote" decision process
3) The project leader would be chosen using a "Call for Tender" process,
with the final decision took by the sponsoring committee.

IMHO, this is much more important than most radical restructuring of
DSpace code base (like some of  the ones currently envisaged). But it
may trigger some other unforeseen radical technical decisions...

Let see how things will evolve!

Have a nice day!

Christophe

Dorothea Salo a ?crit :
> Greetings, DSpace community,
>
> For some time, I've been concerned that the DSpace development process
> hasn't enjoyed as much input from the broader community as would be
> desirable. The voices of less-technical repository managers and other
> staff associated with DSpace repositories have been particularly
> difficult to attract to the discussion. I'm hoping to gather
> impressions and suggestions from this specific segment of the
> community (though others are welcome as well!) to pass on to DSpace
> developers. With any luck, this process will build a stronger
> connection between developers and repository managers going forward.
>
> The DSpace development-priority survey done in 2007 was valuable and
> worthwhile, and if possible, I'd like to revisit some of the questions
> raised there. I'd also like to start "in your own words" discussions
> about what repository managers want and need from DSpace that it isn't
> yet providing.
>
> We can certainly talk here, and I welcome that! More than one DSpace
> developer has agreed to monitor these discussions, and I will be
> summarizing them back to the development list. But I'm completely open
> to other venues as well -- IM, group chat, Web 2.0, Skype, out-of-band
> email -- depending on what people tell me they want.
>
> So. How would you like to do this? Once we've sorted out the process,
> we can get down to business. Feel free to contact me off-list if you
> prefer.
>
> Dorothea
>
>   


-------------- next part --------------
A non-text attachment was scrubbed...
Name: christophe_dupriez.vcf
Type: text/x-vcard
Size: 454 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080807/0537d2ea/attachment.vcf

From mwood at IUPUI.Edu  Thu Aug  7 10:18:45 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Thu, 7 Aug 2008 10:18:45 -0400
Subject: [Dspace-general] DSpace development priorities: starting
	a	discussion
In-Reply-To: <489AF261.70200@destin.be>
References: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com>
	<489AF261.70200@destin.be>
Message-ID: <20080807141845.GD2968@IUPUI.Edu>

Some things to keep in mind:

o  There is no commercial or employment relationship between end-users
   and developers here.  Anybody who wants something done must
   accomplish it either by his own (or his organization's) labor, or
   by moral suasion -- appealing to the value of improving the
   commons, or the good feeling that comes from having done something
   well.

   The good news is that, *because* there is no mechanism of
   compulsion, moral suasion works tolerably well in such situaions.

o  On the other hand, there is most definitely an employment
   relationship between the developer and his own institution.  If I
   want to work on some aspect of DSpace, I have to convince my
   supervisor that the work will benefit the institution sufficiently
   to account for the cost of my time.  It's difficult, but not
   impossible, to sell intangible benefits like building up the
   commons, but it is much easier to sell features that we need
   ourselves.  The result is that the needs of one's own institution
   *usually* come first.

o  Just expounding a need of your institution may cause someone
   elsewhere to realize, "hey, we could use that too -- and we have
   the resources to build it."  So we do all need to talk about our
   needs and wishes, even if we can't realize them ourselves.

o  Code is not all there is.  If your institution can't create code,
   could it contribute documentation or user-interface design?  Could
   you volunteer to monitor a tracker and provide short monthly
   postings on item turnover, or moderate a task force, or maintain a
   most-popular-request list?

o  One of the most effective ways to poison a community project is to
   try to manage contributors as if you have some authority over them.
   They know better.  Any community member (coder or not) who feels
   that his contributions are unappreciated has *nothing to lose* by
   ceasing to contribute, because the only reward for contribution is
   already denied him.  Because the project is held in common, he can
   still work on it for those who *do* reward him.


And a few questions:

On Thu, Aug 07, 2008 at 03:02:25PM +0200, Christophe Dupriez wrote:
> 1) Apply the 80/20 rule: Create an immediatley applicable DSpace package
> which answers to 80% of the needs of 80% of the smaller institutions
> which would be happy to not hire any developer (and keep their money to
> hire a very good application manager) to have an enthusiastic result
> **that the DSpace foundation would guarantee to sustain on the long
> term, always providing an easy upgrade path from one version to the next**

Has this not already been done?  How do we know?  What remains to be
done in order to satisfy the 80%?

> 2) The sponsors should be institutions gathering to provide resources
> (money, people, organisation skills) to obtain this result in a
> reasonable time frame (18 months). The Foundation would coordinate this
> committe, animate the process with a democratic "1 participating
> institution = 1 vote" decision process

Doesn't this just entrench the plutocracy?  Those lacking resources to
support development have no vote.  Did I misunderstand?


And some suggested reading:

  _The Cathedral and the Bazaar_, by Eric S. Raymond.

  An exploration of the economics, psychology, and sociology of
  development by community.  If you want to know how to motivate
  participants in a project like DSpace, or just understand why some
  of them behave so oddly, this is a good place to start.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080807/6c31023b/attachment.bin

From mveve at utk.edu  Thu Aug  7 16:11:27 2008
From: mveve at utk.edu (Veve, Marielle)
Date: Thu, 7 Aug 2008 16:11:27 -0400
Subject: [Dspace-general] Survey: Catalogers working with Non-MARC Metadata
Message-ID: <A0F5B662D126394895DF343A110BFB600129C42A@KFSVS2.utk.tennessee.edu>

From: Veve, Marielle 
Sent: Thursday, August 07, 2008 2:10 PM
To: Veve, Marielle
Subject: Survey: Catalogers working with Non-MARC Metadata

 
To *all catalogers* (with or without MLS) in academic libraries:

 
SURVEY: Integrating Non-MARC Metadata Production into the Duties of
Traditional Catalogers

 
          You are invited to participate in a brief national, online
survey.  The objective of this survey is to research the national trends
in the integration of Non-MARC metadata work into the duties of
traditional catalogers and the perceptions and attitudes catalogers hold
towards non-MARC metadata.

 
            For this study we would like to invite all catalogers in
academic libraries, with or without MLS, who are involved in any aspect
of non-MARC metadata work.  

 
            I am asking you to please participate by answering this
multiple choice survey. Your answers will be completely anonymous and
confidential and will only be used to summarize information. 

*No* names or institution affiliation will be asked.  

 
            Responding to the survey constitutes informed consent to
participate in the research. The survey is voluntary, and you may
withdraw from it at any time. 

 
It should take approximately 10 minutes to answer the 28 multiple choice
questions of the survey.  

 
            To complete the survey, follow this link
http://www.surveymonkey.com/s.aspx?sm=b2XVTS5Z_2f5GV_2fXKUWTfyKw_3d_3d.
The deadline to complete the survey is Sept.1, 2008.  

 
If you have questions at any time about the study or the procedures, you
may contact the principal researcher, Marielle Veve; at Hodges Library,
1015 Volunteer Blvd., Knoxville, TN 37996; mveve at utk.edu.  If you have
questions about your rights as a participant, contact the Compliance
Section at (423) 974-3466.  

 
Thank you in advance for assisting in this research project by taking
the time to respond to the survey. This research project has been
approved by the University of Tennessee's Institutional Review Board.

 
--------

Marielle Veve

Cataloging & Metadata Librarian

Assistant Professor

Hodges Library-University of Tennessee

Knoxville, TN 37996
Phone:  (865) 974-0394

E-mail: mveve at utk.edu <mailto:mveve at utk.edu> 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080807/bce3a1a6/attachment.htm

From christophe.dupriez at destin.be  Fri Aug  8 03:50:36 2008
From: christophe.dupriez at destin.be (Christophe Dupriez)
Date: Fri, 08 Aug 2008 09:50:36 +0200
Subject: [Dspace-general] DSpace development priorities:
	starting	a	discussion
In-Reply-To: <20080807141845.GD2968@IUPUI.Edu>
References: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com>	<489AF261.70200@destin.be>
	<20080807141845.GD2968@IUPUI.Edu>
Message-ID: <489BFACC.2090908@destin.be>

Hi Mark H.,

Terminology:
* developers: IT specialists able to implement / modify DSpace
* users: Information Management specialists able to manage a repository
* end users: anybody able to understand documents contained in a repository

If I follow your thoughts, we should set up some kind of collaborative 
process to define what is (should be) DSpace and fuel developers with 
precise needs definitions / development ideas.

I would agree with you, but please remind:
1) developers have a very short term validation of their work (their 
local application works or not) that users do not have (you only know 
after months/years if your repository is succesfull or not).
2) developers have to formalize their thoughts in a very formal 
language. Users must formalize their projects in the language that will 
be best understood by their funding authorities or by their local 
end-users: cultural differences not always easy to share with DSpace 
community

Just to explain that the cost/benefit equation of collaboration is not 
the same for users than for developers... Some "reward" of experience 
sharing must imbedded in DSpace community management.

So the question, how do we set up / animate a collaborative process 
between DSpace users?

In each institution, the users create documents to request funding, 
organise projects, define tasks of co-workers. Many of those documents 
are readily available on the Web. Some even on DSpace site. One may want 
to propose a frame to organize those documents and identify "blanks" to 
be filled. For instance:
* What are the different kind of repositories (main missions)? I see (at 
least):
   1) Institutional repositories: provide a permanent storage for the 
production of an institution
   2) Subject repositories: provide reference documents on a given topic 
family, worldwide. Will need to interconnect with institutional repositories
   3) National repositories: provide a reference storage of informations 
needed to fulfill a national legal obligations. Will need to 
interconnect with other National repositories
   4) Local repositories: provide local knowledge workers with the 
reference documents they need for their daily work
   Personaly, I work on repository types 2 (WindMusic), 3(Dangerous 
Chemical Products sold in Belgium) and 4(Documents useful for 
PoisonCentre MDs)
* What are the public of those different kind of repositories? What are 
the needs of those public?
* What are the needs / missions of institutions organizing those 
repositories?
* What are the strategy and the tactics those institutions would like to 
follow to make their repositories succesfull?  
* What cross-institutional content initiaves could multiply the impact 
of DSpace initiative (for instance, integration with OCLC services and 
WorldCat) ?
* Priorities: What are their short term needs ? Longer term needs?
* What are the use cases for DSpace?
* What would be the ideal path for each use case?
* How many steps does each use case involve today (if possible)? How 
many could be if DSpace would be improved?
* What need to be developed?

Improving the "WHY" will certainly enlighten the "HOW"...

Have a nice day!

Christophe

Mark H. Wood a ?crit :
> Some things to keep in mind:
>
> o  There is no commercial or employment relationship between end-users
>    and developers here.  Anybody who wants something done must
>    accomplish it either by his own (or his organization's) labor, or
>    by moral suasion -- appealing to the value of improving the
>    commons, or the good feeling that comes from having done something
>    well.
>
>    The good news is that, *because* there is no mechanism of
>    compulsion, moral suasion works tolerably well in such situaions.
>
> o  On the other hand, there is most definitely an employment
>    relationship between the developer and his own institution.  If I
>    want to work on some aspect of DSpace, I have to convince my
>    supervisor that the work will benefit the institution sufficiently
>    to account for the cost of my time.  It's difficult, but not
>    impossible, to sell intangible benefits like building up the
>    commons, but it is much easier to sell features that we need
>    ourselves.  The result is that the needs of one's own institution
>    *usually* come first.
>
> o  Just expounding a need of your institution may cause someone
>    elsewhere to realize, "hey, we could use that too -- and we have
>    the resources to build it."  So we do all need to talk about our
>    needs and wishes, even if we can't realize them ourselves.
>
> o  Code is not all there is.  If your institution can't create code,
>    could it contribute documentation or user-interface design?  Could
>    you volunteer to monitor a tracker and provide short monthly
>    postings on item turnover, or moderate a task force, or maintain a
>    most-popular-request list?
>
> o  One of the most effective ways to poison a community project is to
>    try to manage contributors as if you have some authority over them.
>    They know better.  Any community member (coder or not) who feels
>    that his contributions are unappreciated has *nothing to lose* by
>    ceasing to contribute, because the only reward for contribution is
>    already denied him.  Because the project is held in common, he can
>    still work on it for those who *do* reward him.
>
>
> And a few questions:
>
> On Thu, Aug 07, 2008 at 03:02:25PM +0200, Christophe Dupriez wrote:
>   
>> 1) Apply the 80/20 rule: Create an immediatley applicable DSpace package
>> which answers to 80% of the needs of 80% of the smaller institutions
>> which would be happy to not hire any developer (and keep their money to
>> hire a very good application manager) to have an enthusiastic result
>> **that the DSpace foundation would guarantee to sustain on the long
>> term, always providing an easy upgrade path from one version to the next**
>>     
>
> Has this not already been done?  How do we know?  What remains to be
> done in order to satisfy the 80%?
>
>   
>> 2) The sponsors should be institutions gathering to provide resources
>> (money, people, organisation skills) to obtain this result in a
>> reasonable time frame (18 months). The Foundation would coordinate this
>> committe, animate the process with a democratic "1 participating
>> institution = 1 vote" decision process
>>     
>
> Doesn't this just entrench the plutocracy?  Those lacking resources to
> support development have no vote.  Did I misunderstand?
>
>
> And some suggested reading:
>
>   _The Cathedral and the Bazaar_, by Eric S. Raymond.
>
>   An exploration of the economics, psychology, and sociology of
>   development by community.  If you want to know how to motivate
>   participants in a project like DSpace, or just understand why some
>   of them behave so oddly, this is a good place to start.
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: christophe_dupriez.vcf
Type: text/x-vcard
Size: 454 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080808/a2693bf2/attachment.vcf

From sunilgoria at yahoo.com  Mon Aug 11 02:29:38 2008
From: sunilgoria at yahoo.com (Sunil Goria)
Date: Sun, 10 Aug 2008 23:29:38 -0700 (PDT)
Subject: [Dspace-general] Big File upload problem
Message-ID: <49434.44285.qm@web53402.mail.re2.yahoo.com>

Dear all,
Earlier I requested to solve my problem of file uplaoding. Now I found that I am able to upload file of less than 1 MB in Dspace server throug our LAN. When I try to upload big files grater than 1 MB it gives the error "Internet Explorer cannot display the webpage". Please suggest to uplaod big files of thesis etc in Dspace server. We are using Dspace 1.2 on Linux Enterprises version.
with regards,

Dr. Sunil Goria
Assistant Librarian
University Library,
G.B. Pant University of Agriculture & Technology,
Pantnagar-263145 (India)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080810/f449891d/attachment.htm

From alitra at gmail.com  Mon Aug 11 23:02:22 2008
From: alitra at gmail.com (Alice Tran)
Date: Mon, 11 Aug 2008 17:02:22 -1000
Subject: [Dspace-general] PDF files in DSpace
Message-ID: <a92696f60808112002t6905671ete5e22ef6a84275b7@mail.gmail.com>

Hi,

I've been trying to figure out a best practice for the PDF files we are
ingesting into our Dspace system.  I noticed back in February, Beth from
Ohio, had asked a similar question and I was wondering if anybody else has
since come up with another method or has a best practice to suggest.  I'd be
interested to know what other institutions are doing to prep their scanned
PDFs before ingesting it into Dspace.

Thanks!
Alice Tran
CMS/IR Specialist
University of Hawaii at Manoa
alicet at hawaii.edu
http://library.manoa.hawaii.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080811/1e518b45/attachment.htm

From mdorn at solinet.net  Tue Aug 12 08:34:43 2008
From: mdorn at solinet.net (Givens, Marlee Dorn)
Date: Tue, 12 Aug 2008 08:34:43 -0400
Subject: [Dspace-general] SOLINET live online class on Open Access and
	Repositories
Message-ID: <3E4ED5DFB91E1743934243083EF3578B03F765D9@emailman.soli.net>

Please excuse cross-posting.

 
SOLINET is pleased to announce that there are still seats available for
the following Live Online class:

 
Open Access, Repositories, and More.. (Live Online)

Instructor: Tyler Walters

This class covers the elements of the open access movement, scholarly
communications, and digital repositories.

9/4/08

10:00am-12:00pm Eastern Time

For more information or to register:

http://www.solinet.net/?sc_itemid={445791DD-8052-4296-AC69-0A1B0351A3E8}
<http://www.solinet.net/?sc_itemid=%7b445791DD-8052-4296-AC69-0A1B0351A3
E8%7d> 

 
For our complete catalog, please visit www.solinet.net
<http://www.solinet.net/>  and click on Classes and Events.

 
Thank you!

 
MARLEE DORN GIVENS

Manager, Preservation Services

mdorn at solinet.net <mailto:mdorn at solinet.net> 

404.892.0943 x3980

 
1438 West Peachtree Street NW

Suite 200

Atlanta, GA 30309

Toll Free: 1.800.999.8558

Fax: 404.892.7879 

www.solinet.net <http://www.solinet.net> 

 
Please consider the environment before printing this e-mail.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080812/d8909f0e/attachment.htm

From dsalo at library.wisc.edu  Tue Aug 12 09:02:14 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Tue, 12 Aug 2008 08:02:14 -0500
Subject: [Dspace-general] DSpace development priorities: starting a
	discussion
In-Reply-To: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com>
References: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com>
Message-ID: <356cf3980808120602h308074dck7cda6cd7c8381564@mail.gmail.com>

Greetings, DSpace community,

Apologies to those for whom this message is duplicated; Mark Diggory
asked me to bring dspace-tech into the loop. I've kept the original
message to dspace-general quoted below, for those who haven't seen it.

I want to thank everyone who has participated in this discussion
already, both on- and off-list; I've gotten quite a bit of valuable
feedback. Here's what I propose to do. If I don't hear objections or
suggested refinements, I'll come up with days and times and we'll get
started.

First, I'd like to toss out a weekly question to the dspace-general
group by way of gathering raw reactions and requirements that can
later be distilled into something actionable. I have a supply of such
questions, and I will be drawing more from the 2007 user survey,
though of course I'm very open to suggestions from other community
members. Responses can be on- or off-list; I will summarize off-list
responses to the list.

I would ask that responders to the weekly question answer immediately,
BEFORE they read any other responses. This is important! Reading other
answers tends to circumscribe one's own, reducing the overall breadth
of response. In fact, I am tempted to say that the week of the weekly
question should be reserved for immediate reaction, saving discussion
for the next week... so we could have reactions to one question and
discussion of another going on in different threads simultaneously.

I would also like to do online chats or similar synchronous
interaction at least biweekly. Timezones and language barriers are
obviously a problem with that, but I'll do my best -- and I would
appreciate hearing from potential chat hosts in Europe and Asia. I've
set up a room on Meebo, faute de mieux; if there are better ideas, let
me know.

Finally, I've heard some interest in an informal birds-of-a-feather
meeting on this topic at SPARC Digital Repositories 2008. I do expect
to attend that conference, and I'm quite willing to facilitate a BOF.

Let me know if this seems good -- have at it! And again, thank you.

Dorothea

On Wed, Aug 6, 2008 at 3:07 PM, Dorothea Salo <dsalo at library.wisc.edu> wrote:
> Greetings, DSpace community,
>
> For some time, I've been concerned that the DSpace development process
> hasn't enjoyed as much input from the broader community as would be
> desirable. The voices of less-technical repository managers and other
> staff associated with DSpace repositories have been particularly
> difficult to attract to the discussion. I'm hoping to gather
> impressions and suggestions from this specific segment of the
> community (though others are welcome as well!) to pass on to DSpace
> developers. With any luck, this process will build a stronger
> connection between developers and repository managers going forward.
>
> The DSpace development-priority survey done in 2007 was valuable and
> worthwhile, and if possible, I'd like to revisit some of the questions
> raised there. I'd also like to start "in your own words" discussions
> about what repository managers want and need from DSpace that it isn't
> yet providing.
>
> We can certainly talk here, and I welcome that! More than one DSpace
> developer has agreed to monitor these discussions, and I will be
> summarizing them back to the development list. But I'm completely open
> to other venues as well -- IM, group chat, Web 2.0, Skype, out-of-band
> email -- depending on what people tell me they want.
>
> So. How would you like to do this? Once we've sorted out the process,
> we can get down to business. Feel free to contact me off-list if you
> prefer.
>
> Dorothea
>
> --
> Dorothea Salo dsalo at library.wisc.edu
> Digital Repository Librarian AIM: mindsatuw
> University of Wisconsin
> Rm 218, Memorial Library
> (608) 262-5493
>


-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From hellpop at umich.edu  Tue Aug 12 15:02:08 2008
From: hellpop at umich.edu (Jim Ottaviani)
Date: Tue, 12 Aug 2008 15:02:08 -0400
Subject: [Dspace-general] PDF files in DSpace (Alice Tran)
In-Reply-To: <mailman.247.1218556977.31028.dspace-general@mit.edu>
Message-ID: <C4C75670.18C75%hellpop@umich.edu>


> I've been trying to figure out a best practice for the PDF files we are
> ingesting into our Dspace system.  I noticed back in February, Beth from
> Ohio, had asked a similar question and I was wondering if anybody else has
> since come up with another method or has a best practice to suggest.  I'd be
> interested to know what other institutions are doing to prep their scanned
> PDFs before ingesting it into Dspace.

I'm not sure whether these are the sort of thing you had in mind, but we
recently revised our PDF best practices, and the recommendations are
available at

  http://hdl.handle.net/2027.42/58005


Jim

____________________________________
Jim Ottaviani
+1 734-763-4835
Coordinator, Deep Blue
http://deepblue.lib.umich.edu
University of Michigan Library

       Quis custodiet ipsos custodes
          --Juvenal, Satires VI, 347


From Claudia.Juergen at ub.uni-dortmund.de  Fri Aug 15 10:27:14 2008
From: Claudia.Juergen at ub.uni-dortmund.de (=?ISO-8859-1?Q?Claudia_J=FCrgen?=)
Date: Fri, 15 Aug 2008 16:27:14 +0200
Subject: [Dspace-general] [Dspace-tech] Location Proposal for DSUG Mtg
	Fall 2009
In-Reply-To: <20080609083120.e7x1auvq9wogg8g0@www.dspace.org>
References: <20080609083120.e7x1auvq9wogg8g0@www.dspace.org>
Message-ID: <48A59242.9050205@ub.uni-dortmund.de>

Hi Valorie,

as Fall is getting closer, will there be a meeting or did this plan not 
develop any furhter.

Sunny greetings

Claudia J?rgen


Valorie Hollister schrieb:
> DSpace Community -
> 
> As many of you are already aware, the next DSpace User Group Meeting  
> will be held in conjunction with next year's Open Repositories in May  
> 2009 in Atlanta, Georgia, USA.
> 
> DSpace Foundation would like to help organize a stand-alone DSUG  
> meeting sometime between September - November 2009 in Europe. We've  
> already have a few informal offers to host the meeting, but before we  
> make a decision we would like to give the entire DSpace community a  
> chance to propose their location.
> 
> Some of the key criteria for hosting the meeting include:
> -location must be easily accessible for international participants  
> (i.e. close to an international airport)
> -meeting facilities must accommodate at least 200 people for 2 days
> -maximum charges per participant should not exceed $300
> -meeting facilities must be close to enough available, inexpensive  
> lodging for participants
> 
> If you are interested in hosting the next DSUG meeting, please contact  
> me at val at dspace.org.
> 
> Valorie Hollister
> Community Outreach Manager
> DSpace Foundation
> 
> 
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


From Claudia.Juergen at ub.uni-dortmund.de  Fri Aug 15 10:31:28 2008
From: Claudia.Juergen at ub.uni-dortmund.de (=?ISO-8859-1?Q?Claudia_J=FCrgen?=)
Date: Fri, 15 Aug 2008 16:31:28 +0200
Subject: [Dspace-general] [Dspace-tech] Location Proposal for DSUG Mtg
	Fall 2009
In-Reply-To: <48A59242.9050205@ub.uni-dortmund.de>
References: <20080609083120.e7x1auvq9wogg8g0@www.dspace.org>
	<48A59242.9050205@ub.uni-dortmund.de>
Message-ID: <48A59340.30107@ub.uni-dortmund.de>

Hi All,

sorry overlooked the 2009 and thought about 2008.

Claudia


Claudia J?rgen schrieb:
> Hi Valorie,
> 
> as Fall is getting closer, will there be a meeting or did this plan not 
> develop any furhter.
> 
> Sunny greetings
> 
> Claudia J?rgen
> 
> 
> Valorie Hollister schrieb:
>> DSpace Community -
>>
>> As many of you are already aware, the next DSpace User Group Meeting  
>> will be held in conjunction with next year's Open Repositories in May  
>> 2009 in Atlanta, Georgia, USA.
>>
>> DSpace Foundation would like to help organize a stand-alone DSUG  
>> meeting sometime between September - November 2009 in Europe. We've  
>> already have a few informal offers to host the meeting, but before we  
>> make a decision we would like to give the entire DSpace community a  
>> chance to propose their location.
>>
>> Some of the key criteria for hosting the meeting include:
>> -location must be easily accessible for international participants  
>> (i.e. close to an international airport)
>> -meeting facilities must accommodate at least 200 people for 2 days
>> -maximum charges per participant should not exceed $300
>> -meeting facilities must be close to enough available, inexpensive  
>> lodging for participants
>>
>> If you are interested in hosting the next DSUG meeting, please contact  
>> me at val at dspace.org.
>>
>> Valorie Hollister
>> Community Outreach Manager
>> DSpace Foundation
>>
>>
>> -------------------------------------------------------------------------
>> Check out the new SourceForge.net Marketplace.
>> It's the best place to buy or sell services for
>> just about anything Open Source.
>> http://sourceforge.net/services/buy/index.php
>> _______________________________________________
>> DSpace-tech mailing list
>> DSpace-tech at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


From mcgeetho at shu.edu  Fri Aug 15 12:53:41 2008
From: mcgeetho at shu.edu (Thomas A McGee)
Date: Fri, 15 Aug 2008 12:53:41 -0400
Subject: [Dspace-general] Tom McGee is out of the office.
Message-ID: <OF09D5D3F9.F80FE547-ON852574A6.005CCE89-852574A6.005CCE89@shu.edu>


I will be out of the office starting  08/15/2008 and will not return until
08/25/2008.

I'm on vacation the week of August 18. I will respond to your message when
I return on Monday the 25th.


From mdiggory at MIT.EDU  Fri Aug 15 13:27:19 2008
From: mdiggory at MIT.EDU (Mark Diggory)
Date: Fri, 15 Aug 2008 10:27:19 -0700
Subject: [Dspace-general] DSpace 1.5.1-beta Release
Message-ID: <04657E1C-77E7-4174-9C6B-BD0B23AE4145@mit.edu>

Dear DSpace Community,

We are pleased to announce the release of DSpace 1.5.1 beta.

This beta is primarily a bug fix release incorporating numerous bug  
fixes and enhancements. Refer to the

http://wiki.dspace.org/CurrentReleaseToDo

and SVN history for details on these modifications.

http://fisheye3.atlassian.com/changelog/dspace/branches/dspace-1_5_x? 
todate=1218776345558

The final release of 1.5.1 should be out before the end of August.   
We request that community members interested in testing the beta  
release please download it and verify that they can complete upgrade  
and fresh installation. We request that the svn branch be frozen  
until we do complete the final release, if developers do have further  
fixes, please request their addition through the developers list  
before moving forward with SVN commits.

The documentation for this release is bundled within the package.

DSpace 1.5.1 beta can be downloaded from the files area at

http://sourceforge.net/project/showfiles.php? 
group_id=19984&package_id=143548&release_id=619910

or with SVN from

http://dspace.svn.sf.net/svnroot/dspace/tags/dspace-1_5_1-beta/

Please use the mailing lists to provide feedback on this release.

Those wishing to do development work with DSpace are strongly  
encouraged to obtain the source code using SVN. This is very  
straightforward and a guide to doing this is available here: http:// 
wiki.dspace.org/ContributionGuidelines

We would also like to take this opportunity to invite you all to take  
part in the DSpace development process. Extra developer hands are  
always welcome, but there are other ways you can help:

- Test the system and report bugs
- Provide documentation (for end users and institutions, as well as
   technical)
- Provide or update language packs
- Share your deployment experiences
- Donate content and metadata for testing and research
- Share your technical experience and ideas

Please visit the DSpace Wiki to see the various resources and  
collaboration tools available to the DSpace community: http:// 
wiki.dspace.org/DspaceResources

Sincerely,
Mark Diggory


~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage


From dsalo at library.wisc.edu  Mon Aug 18 09:24:41 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Mon, 18 Aug 2008 08:24:41 -0500
Subject: [Dspace-general] Question one: What's working and what isn't?
Message-ID: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com>

Greetings, DSpace community,

I've heard enough encouragement to keep on with my plans for an
informal, qualitative information-gathering on DSpace development. So
let's get started!

This week's question (based on Q21 from the 2006 survey) is about
DSpace's existing functionality. Please offer one to three existing
DSpace features that you believe work well in your situation, then
offer one to three existing features that you believe need
improvement. Feel free to explain your answers at length! Also, please
let us know which version of DSpace you are running.

Housekeeping:

- Please respond to the dspace-general list, or to me directly.
DSpace-tech has a 1.5.1 beta to talk about, and I don't want to derail
that very important conversation!
- Please respond before reading or answering other responses!
- I will summarize off-list responses to dspace-general no later than Friday.

I have set up a Meebo Room at
<http://www.meebo.com/room/DSpaceDevelopment/> for live-chat
discussion of the weekly topic. I am currently planning to run an
hourlong chat at 9 am CT Wednesday (10 am ET, 3 pm GMT). You do not
need to sign up with Meebo to participate. You do, however, need the
room password, which is "dspace" (no quotes) -- this isn't for
security, just an anti-random-troll measure.

Thanks in advance to all participants.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From mdorn at solinet.net  Mon Aug 18 09:57:24 2008
From: mdorn at solinet.net (Givens, Marlee Dorn)
Date: Mon, 18 Aug 2008 09:57:24 -0400
Subject: [Dspace-general] SOLINET live online class Introduction to
	Institutional Repositories
Message-ID: <3E4ED5DFB91E1743934243083EF3578B03F7661D@emailman.soli.net>

SOLINET is pleased to announce that there are still seats available for
the following live online class:

 
Introduction to Institutional Repositories (Live Online)

 
This session will define and describe the characteristics and features
of institutional repositories, which can include not only scholarship of
faculty and students but also digital assets such as administrative
records, course notes, technical reports and learning objects.

 
September 16

10:00am-12:00pm Eastern Time

Instructor: David Greenebaum

 
Price: $120 SOLINET members/$170 non-members

 
For more information or to register, please visit:
http://www.solinet.net/?sc_itemid={948AD1FB-6E39-45EC-B462-142B59FED689}
<http://www.solinet.net/?sc_itemid=%7b948AD1FB-6E39-45EC-B462-142B59FED6
89%7d> 

 
Visit our Web site at www.solinet.net <http://www.solinet.net/> 

 
Thank you!

 
MARLEE DORN GIVENS

Manager, Preservation Services

mdorn at solinet.net <mailto:mdorn at solinet.net> 

404.892.0943 x3980

 
1438 West Peachtree Street NW

Suite 200

Atlanta, GA 30309

Toll Free: 1.800.999.8558

Fax: 404.892.7879 

www.solinet.net <http://www.solinet.net> 

 
Please consider the environment before printing this e-mail.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080818/73509e07/attachment.htm

From dsalo at library.wisc.edu  Mon Aug 18 12:56:07 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Mon, 18 Aug 2008 11:56:07 -0500
Subject: [Dspace-general] Question one: What's working and what isn't?
In-Reply-To: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com>
References: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com>
Message-ID: <356cf3980808180956m2b55ff9dkd601ed3c83b7498c@mail.gmail.com>

Answering my own question... We're currently running 1.4.1 in
production, and are testing and modding 1.5 for a rollout soon.

> This week's question (based on Q21 from the 2006 survey) is about
> DSpace's existing functionality. Please offer one to three existing
> DSpace features that you believe work well in your situation,

I'm very happy with Manakin theming (barring a few minor growls). As a
de-facto consortial repository, being able to theme communities and
have the theme cascade to subcommunities and collections is a major
win.

In the "little things make a big difference" category, the checksum
checker makes me happy. Accidentally losing or mangling data is a
nightmare; it would completely demolish the trust my user communities
have in the service, and my own trust in the software. That nightly
"all is well" email is a relief.

The HTML display engine pretty much Just Works. Some of the best and
most important work I've captured in both of the repositories I've run
have been websites. I'm very grateful for this feature.

 then
> offer one to three existing features that you believe need
> improvement. Feel free to explain your answers at length! Also, please
> let us know which version of DSpace you are running.

Repeating quietly to myself "no new features... no new features..."

The whole communities/collections model needs a rethink, I think.
Faculty I talk to find it confusing and unintuitive; they expect
communities to be able to contain items, and collections to be able to
contain other collections. (The latter is particularly important for
some kinds of scoped searching.) Perhaps following on from this, they
expect to be able to make changes to community information that only
an administrator can make, because there is no DSpace analogue to
"collection administrator" for communities. Finally, for our
consortial-repository purposes it's not good that only an
administrator can change collection/item access policies. I need to be
able to hand that work out to librarians at our member campuses, but
DSpace won't let me.

I understand that DSpace is meant to be an archival system, but the
model of "metadata and bitstreams can change before final deposit, but
not afterwards except by administrator fiat" doesn't accord with user
expectations where I am. People make metadata mistakes and don't
notice them until after approving the submission. People upload
bitstreams and want to swap them out for better bitstreams. Stuff
comes in through a variety of channels that needs editing after the
fact (authority control, anyone?). I spend a *lot* of time -- much too
much time! -- dealing with things like this, as well as talking down
irritated users who want to be able to fix these things without going
through me. I also end up editing metadata directly in the database
(yes, I know, bad bad me!) because one SQL query takes so much less
time than making the same change to forty-'leven items individually in
the UI.

The input-forms.xml system of modifying forms needs an overhaul as
well. One problem with it is that not all repo managers have server
access in order to modify this file, but they're a lot closer to the
content/metadata than the IT professionals who *do* have access to the
file. Another problem is some really bad interactions with the
hardcoded "big three" front-page questions -- if you put date.issued
in your input-forms.xml, but your depositor doesn't check the
"previously published" box, DSpace wags a stern finger and won't let
them proceed! (This is a serious problem for theses and dissertations,
which do have a date.issued but aren't colloquially considered
previously-published in many disciplines.) Finally, this file doesn't
have any conditional logic. It can't, for example, say "okay, if
dc.type is Working Paper, show these fields; otherwise, show those."
This makes it essentially impossible to simplify the forms in a
heterogeneous collection, which is an unhappy thing for usability.

Right, those are my three. Next?

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From mveve at utk.edu  Tue Aug 19 09:21:54 2008
From: mveve at utk.edu (Veve, Marielle)
Date: Tue, 19 Aug 2008 09:21:54 -0400
Subject: [Dspace-general] Survey: Catalogers working with Non-MARC Metadata
Message-ID: <A0F5B662D126394895DF343A110BFB60012F26A1@KFSVS2.utk.tennessee.edu>

From: Veve, Marielle 
Sent: Thursday, August 07, 2008 2:10 PM
To: Veve, Marielle
Subject: Survey: Catalogers working with Non-MARC Metadata

 
To *all catalogers* (with or without MLS) in academic libraries:

[Please excuse the cross-posting]

 
SURVEY: Integrating Non-MARC Metadata Production into the Duties of
Traditional Catalogers

 
          You are invited to participate in a brief national, online
survey.  The objective of this survey is to research the national trends
in the integration of Non-MARC metadata work into the duties of
traditional catalogers and the perceptions and attitudes catalogers hold
towards non-MARC metadata.

 
            For this study we would like to invite all catalogers in
academic libraries, with or without MLS, who are involved in any aspect
of non-MARC metadata work.  

 
            I am asking you to please participate by answering this
multiple choice survey. Your answers will be completely anonymous and
confidential and will only be used to summarize information. 

*No* names or institution affiliation will be asked.  

 
            Responding to the survey constitutes informed consent to
participate in the research. The survey is voluntary, and you may
withdraw from it at any time. 

 
It should take approximately 10 minutes to answer the 28 multiple choice
questions of the survey.  

 
            To complete the survey, follow this link
http://www.surveymonkey.com/s.aspx?sm=b2XVTS5Z_2f5GV_2fXKUWTfyKw_3d_3d.
The deadline to complete the survey is Sept.1, 2008.  

 
If you have questions at any time about the study or the procedures, you
may contact the principal researcher, Marielle Veve; at Hodges Library,
1015 Volunteer Blvd., Knoxville, TN 37996; mveve at utk.edu.  If you have
questions about your rights as a participant, contact the Compliance
Section at (423) 974-3466.  

 
Thank you in advance for assisting in this research project by taking
the time to respond to the survey. This research project has been
approved by the University of Tennessee's Institutional Review Board.

 
--------

Marielle Veve

Cataloging & Metadata Librarian

Assistant Professor

Hodges Library-University of Tennessee

Knoxville, TN 37996
Phone:  (865) 974-0394

E-mail: mveve at utk.edu <mailto:mveve at utk.edu> 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/559d3a4e/attachment.htm

From robin.taylor at ed.ac.uk  Tue Aug 19 09:50:11 2008
From: robin.taylor at ed.ac.uk (Robin Taylor)
Date: Tue, 19 Aug 2008 14:50:11 +0100
Subject: [Dspace-general] Question one: What's working and what isn't?
In-Reply-To: <356cf3980808180956m2b55ff9dkd601ed3c83b7498c@mail.gmail.com>
Message-ID: <200808191350.m7JDoBwG010829@lmtp1.ucs.ed.ac.uk>

Hi Dorothea,

Thinking out loud about input-forms.xml:- In order to provide different
metadata screens for theses we include the word theses in the collection
names. We look for presence of 'theses' in the collection name before
deciding which input-form to use. In effect we are using the collection name
as a proxy for the type of item. Really it would be better for us to ask the
submitter what type of item they are submitting and use an input-form based
on item type rather than collection name. I am interested to know how people
are currently making use of input-forms.xml.

Cheers, Robin. 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


-----Original Message-----
From: dspace-general-bounces at mit.edu [mailto:dspace-general-bounces at mit.edu]
On Behalf Of Dorothea Salo
Sent: 18 August 2008 17:56
To: dspace
Subject: Re: [Dspace-general] Question one: What's working and what isn't?

Answering my own question... We're currently running 1.4.1 in production,
and are testing and modding 1.5 for a rollout soon.

> This week's question (based on Q21 from the 2006 survey) is about 
> DSpace's existing functionality. Please offer one to three existing 
> DSpace features that you believe work well in your situation,

I'm very happy with Manakin theming (barring a few minor growls). As a
de-facto consortial repository, being able to theme communities and have the
theme cascade to subcommunities and collections is a major win.

In the "little things make a big difference" category, the checksum checker
makes me happy. Accidentally losing or mangling data is a nightmare; it
would completely demolish the trust my user communities have in the service,
and my own trust in the software. That nightly "all is well" email is a
relief.

The HTML display engine pretty much Just Works. Some of the best and most
important work I've captured in both of the repositories I've run have been
websites. I'm very grateful for this feature.

 then
> offer one to three existing features that you believe need 
> improvement. Feel free to explain your answers at length! Also, please 
> let us know which version of DSpace you are running.

Repeating quietly to myself "no new features... no new features..."

The whole communities/collections model needs a rethink, I think.
Faculty I talk to find it confusing and unintuitive; they expect communities
to be able to contain items, and collections to be able to contain other
collections. (The latter is particularly important for some kinds of scoped
searching.) Perhaps following on from this, they expect to be able to make
changes to community information that only an administrator can make,
because there is no DSpace analogue to "collection administrator" for
communities. Finally, for our consortial-repository purposes it's not good
that only an administrator can change collection/item access policies. I
need to be able to hand that work out to librarians at our member campuses,
but DSpace won't let me.

I understand that DSpace is meant to be an archival system, but the model of
"metadata and bitstreams can change before final deposit, but not afterwards
except by administrator fiat" doesn't accord with user expectations where I
am. People make metadata mistakes and don't notice them until after
approving the submission. People upload bitstreams and want to swap them out
for better bitstreams. Stuff comes in through a variety of channels that
needs editing after the fact (authority control, anyone?). I spend a *lot*
of time -- much too much time! -- dealing with things like this, as well as
talking down irritated users who want to be able to fix these things without
going through me. I also end up editing metadata directly in the database
(yes, I know, bad bad me!) because one SQL query takes so much less time
than making the same change to forty-'leven items individually in the UI.

The input-forms.xml system of modifying forms needs an overhaul as well. One
problem with it is that not all repo managers have server access in order to
modify this file, but they're a lot closer to the content/metadata than the
IT professionals who *do* have access to the file. Another problem is some
really bad interactions with the hardcoded "big three" front-page questions
-- if you put date.issued in your input-forms.xml, but your depositor
doesn't check the "previously published" box, DSpace wags a stern finger and
won't let them proceed! (This is a serious problem for theses and
dissertations, which do have a date.issued but aren't colloquially
considered previously-published in many disciplines.) Finally, this file
doesn't have any conditional logic. It can't, for example, say "okay, if
dc.type is Working Paper, show these fields; otherwise, show those."
This makes it essentially impossible to simplify the forms in a
heterogeneous collection, which is an unhappy thing for usability.

Right, those are my three. Next?

Dorothea

--
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218,
Memorial Library
(608) 262-5493
_______________________________________________
Dspace-general mailing list
Dspace-general at mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general


From sdl at aber.ac.uk  Tue Aug 19 11:16:42 2008
From: sdl at aber.ac.uk (Stuart Lewis [sdl])
Date: Tue, 19 Aug 2008 16:16:42 +0100
Subject: [Dspace-general] The new RSP blog directory
In-Reply-To: <C4D0A22A.1704F%sdl@aber.ac.uk>
Message-ID: <C4D0A26A.17051%sdl@aber.ac.uk>

[apologies for cross-posting]

Directory of repository related blogs (http://rsp.ac.uk/blogs/)
---------------------------------------------------------------
The JISC funded Repositories Support Project has today launched a new
service - The RSP Blog Directory (http://rsp.ac.uk/blogs/). It provides
a list of recommended and informative blogs regarding the repository
scene from around the globe. Listed blogs include personal creations
from those with first hand experience of repository management and/or
technical development of repository software; blogs for specific
repositories, projects and software developers; as well as blogs for
groups and societies with an interest in the open access movement and
digital curation.

Each entry in the directory has a brief description of what the blog
contains, with links to view either the entire blog or just the RSS
feed.

Blogs have been arranged into categories by type, and you are able
to download an OPML file to view the RSS feeds within your blog reader
of choice for a selected category, or for all the blogs listed in the
directory.

We hope the directory is pretty comprehensive but if you think there are
any blogs missing from this list, please e-mail your suggestion to the
RSP team at support at rsp.ac.uk.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/2a032227/attachment.htm

From j.proven at abertay.ac.uk  Tue Aug 19 11:37:32 2008
From: j.proven at abertay.ac.uk (Proven, Jackie)
Date: Tue, 19 Aug 2008 16:37:32 +0100
Subject: [Dspace-general] Skip file upload option
Message-ID: <B848D37385FF2E40B50AF95D7F5A99430681DF37@uadmta03.uad.ac.uk>

We are newcomers to DSpace and have just installed v1.5. I believe it is
now possible by default to disable the hard requirement to include a
full-text in the submission (so you can skip the file upload step). Can
anyone tell us how to implement this as we have been unable to find
details in any documentation.
 
Many thanks
Jackie
 

--
Jackie Proven
Senior Information Officer 
Information Services, University of Abertay Dundee 
Tel: 01382 308867 
E-mail: j.proven at abertay.ac.uk <mailto:j.proven at abertay.ac.uk>  
  
The University of Abertay Dundee is a charity registered in Scotland,
No: SC016040 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/21a3ad53/attachment.htm

From Claudia.Juergen at ub.uni-dortmund.de  Tue Aug 19 12:35:49 2008
From: Claudia.Juergen at ub.uni-dortmund.de (Claudia Juergen)
Date: Tue, 19 Aug 2008 18:35:49 +0200 (CEST)
Subject: [Dspace-general] Skip file upload option
In-Reply-To: <B848D37385FF2E40B50AF95D7F5A99430681DF37@uadmta03.uad.ac.uk>
References: <B848D37385FF2E40B50AF95D7F5A99430681DF37@uadmta03.uad.ac.uk>
Message-ID: <ccb81bc0c6d9e3db291b8b2916e0efa7.squirrel@mail.ub.uni-dortmund.de>

Hi Jackie,

you must set

webui.submit.upload.required = false

in your dspace.cfg. Per default it is set to true.

Claudia

> We are newcomers to DSpace and have just installed v1.5. I believe it is
> now possible by default to disable the hard requirement to include a
> full-text in the submission (so you can skip the file upload step). Can
> anyone tell us how to implement this as we have been unable to find
> details in any documentation.
>
> Many thanks
> Jackie
>
>
> --
> Jackie Proven
> Senior Information Officer
> Information Services, University of Abertay Dundee
> Tel: 01382 308867
> E-mail: j.proven at abertay.ac.uk <mailto:j.proven at abertay.ac.uk>
>
> The University of Abertay Dundee is a charity registered in Scotland,
> No: SC016040
>
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general
>


From sshreeve at illinois.edu  Tue Aug 19 12:57:36 2008
From: sshreeve at illinois.edu (Sarah L. Shreeves)
Date: Tue, 19 Aug 2008 11:57:36 -0500
Subject: [Dspace-general] Question one: What's working and what isn't?
Message-ID: <48AAFB80.6040302@illinois.edu>

We're running DSpace 1.4.2 (heavily customized) and are in the process 
of upgrading to 1.5 - we expect to do that later this fall.

Things that work well:

 - I appreciate the metadata template for collections. We do a fair 
number of serial type things that have very common metadata, so it's 
useful to be able to have those pre-filled out fields. I'd love to be 
able to reuse these templates from one collection to another but that 
would be a bonus.

- I agree with Dorothea that the checksum checker and the html display 
engine work well and make me and (in the case of the html display 
engine) my end users happy.

- I am pretty happy with how Manakin alongside the customizable 
submission process will simplify the upload processes for our users and 
will allow us to tailor some communities for specific user groups.

Areas that need further development:

- I'd agree with Dorothea that the community / collection structure and 
administration could be re-thought. We've actually customized our 
instance and have added community administration functionality which  
has turned out to be so crucial for us. It's allowed me to get 
departmental libraries and colleges involved in IDEALS at a level that I 
don't think would have been possible otherwise. For example, our 
Agriculture library completely takes care of the Agriculture community - 
adds sub-communities and collections as needed, adds additional 
administrators, etc - which has been a great way to distribute the work 
of running IDEALS. We're in talks with our Grad College now about ETD's 
which will absolutely require community level administration. This is a 
very important development area.

- Statistics and Reports - I know that there are a couple of stats 
packages out there (and we're using one currently) but none seem to be 
very satisfactory - or haven't been upgraded to work with Manakin. This 
is one of the primary selling points of IDEALS for many - and what we 
have in place is pretty basic. We really have to get stats and reports 
integrated into the core DSpace code.

- Better ways for both repo managers and collection administrators to 
edit metadata both individually and in bulk. I don't have direct access 
to the database - and honestly wouldn't know how to change things there 
if I did (I'd have to reach back pretty far in my memory) - so I either 
have to ask Tim D. to do updates or I have to do things item by item. 
The same is true of my collection administrators. The item by item 
update process for metadata is painful (I never send my collection 
administrators there if I can avoid it) - this could certainly be 
improved. I also think that a bulk update would be an extremely useful 
development - from the user interface!

These are brief thoughts but I wanted to get them out there.

Sarah

------------------------------------------
Sarah L. Shreeves
Coordinator, IDEALS
http://www.ideals.uiuc.edu/
University of Illinois at Urbana-Champaign
sshreeve at illinois.edu
217-333-4648 or 217-244-3877


From dsalo at library.wisc.edu  Wed Aug 20 08:59:04 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Wed, 20 Aug 2008 07:59:04 -0500
Subject: [Dspace-general] Chat in one hour
Message-ID: <356cf3980808200559s22ce2880q637e9a7201862cdc@mail.gmail.com>

Greetings, DSpace community,

Just a quick reminder that repository managers, support staff, and
developers are welcome to meet each other and chat informally about
the software in one hour (10 Eastern, 9 Central, 3 GMT) in the
DSpaceDevelopment Meebo room at
<http://meebo.com/room/DSpaceDevelopment>. The room password is
"dspace" (no quotes).

I'm there already if anyone cares to turn up early. If you have
trouble getting in, you can contact me by email at this address, or
via AIM at "mindsatuw".

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From mwood at IUPUI.Edu  Wed Aug 20 09:10:02 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Wed, 20 Aug 2008 09:10:02 -0400
Subject: [Dspace-general] Community admin.s; statistics
In-Reply-To: <48AAFB80.6040302@illinois.edu>
References: <48AAFB80.6040302@illinois.edu>
Message-ID: <20080820131002.GB9118@IUPUI.Edu>

On Tue, Aug 19, 2008 at 11:57:36AM -0500, Sarah L. Shreeves wrote:
> - I'd agree with Dorothea that the community / collection structure and 
> administration could be re-thought. We've actually customized our 
> instance and have added community administration functionality which  
> has turned out to be so crucial for us. It's allowed me to get 
> departmental libraries and colleges involved in IDEALS at a level that I 
> don't think would have been possible otherwise.

If you haven't yet prepared a patch to share -- I think this would be
widely appreciated.

> - Statistics and Reports - I know that there are a couple of stats 
> packages out there (and we're using one currently) but none seem to be 
> very satisfactory - or haven't been upgraded to work with Manakin.

I think that as we move forward on that problem, we need to work out
the various meanings of "statistics".  Different consumers
(organizational admin.s, system admin.s, community/collection admin.s,
contributors, users) want to know different classes of things or want
them presented in different ways.  For example, are you looking for
overall reports, or per-object statistics distributed throughout the
user interface(s)?  There's a lot of thought-work yet to be done, and
different sites will want to use different approaches.

[shameless plug] That's why, on the edges of this challenge, I've been
working to get code like patch 2025998* to a state fit for inclusion,
to make it easier for lots of people to plug into common object
instrumentation points and try out their ideas concerning the best way
to store, aggregate and present statistics.

Anyway I think that the failure of "statistics" to gain traction is in
part due to collective confusion over what it should mean to "do
statistics in DSpace".  Once we understand the consumer communities
and their separate needs, I think consensus will be more likely.

------------------------
* http://sourceforge.net/tracker/index.php?func=detail&aid=2025998&group_id=19984&atid=319984

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080820/7c09cc53/attachment.bin

From sshreeve at illinois.edu  Wed Aug 20 09:33:35 2008
From: sshreeve at illinois.edu (Sarah L. Shreeves)
Date: Wed, 20 Aug 2008 08:33:35 -0500
Subject: [Dspace-general] Community admin.s; statistics
In-Reply-To: <20080820131002.GB9118@IUPUI.Edu>
References: <48AAFB80.6040302@illinois.edu> <20080820131002.GB9118@IUPUI.Edu>
Message-ID: <48AC1D2F.70509@illinois.edu>

Yes, I definitely agree that we need to define what this means.

Sarah

Mark H. Wood wrote:
> - Statistics and Reports - I know that there are a couple of stats
>> packages out there (and we're using one currently) but none seem to be 
>> very satisfactory - or haven't been upgraded to work with Manakin.
>>     
>
> I think that as we move forward on that problem, we need to work out
> the various meanings of "statistics".  Different consumers
> (organizational admin.s, system admin.s, community/collection admin.s,
> contributors, users) want to know different classes of things or want
> them presented in different ways.  For example, are you looking for
> overall reports, or per-object statistics distributed throughout the
> user interface(s)?  There's a lot of thought-work yet to be done, and
> different sites will want to use different approaches.
>
> [shameless plug] That's why, on the edges of this challenge, I've been
> working to get code like patch 2025998* to a state fit for inclusion,
> to make it easier for lots of people to plug into common object
> instrumentation points and try out their ideas concerning the best way
> to store, aggregate and present statistics.
>
> Anyway I think that the failure of "statistics" to gain traction is in
> part due to collective confusion over what it should mean to "do
> statistics in DSpace".  Once we understand the consumer communities
> and their separate needs, I think consensus will be more likely.
>
> ------------------------
> * http://sourceforge.net/tracker/index.php?func=detail&aid=2025998&group_id=19984&atid=319984
>
>   
-- 
------------------------------------------
Sarah L. Shreeves
Coordinator, IDEALS
http://www.ideals.uiuc.edu/
University of Illinois at Urbana-Champaign
sshreeve at illinois.edu
217-333-4648 or 217-244-3877


From tdonohue at illinois.edu  Wed Aug 20 10:12:24 2008
From: tdonohue at illinois.edu (Tim Donohue)
Date: Wed, 20 Aug 2008 09:12:24 -0500
Subject: [Dspace-general] Community admin.s; statistics
In-Reply-To: <20080820131002.GB9118@IUPUI.Edu>
References: <48AAFB80.6040302@illinois.edu> <20080820131002.GB9118@IUPUI.Edu>
Message-ID: <48AC2648.8040801@illinois.edu>

Mark,

Quick response to your comment about a Community Administration patch...

Mark H. Wood wrote:
> On Tue, Aug 19, 2008 at 11:57:36AM -0500, Sarah L. Shreeves wrote:
>> - I'd agree with Dorothea that the community / collection structure and 
>> administration could be re-thought. We've actually customized our 
>> instance and have added community administration functionality which  
>> has turned out to be so crucial for us. It's allowed me to get 
>> departmental libraries and colleges involved in IDEALS at a level that I 
>> don't think would have been possible otherwise.
> 
> If you haven't yet prepared a patch to share -- I think this would be
> widely appreciated.

The patch we are currently using at U of Illinois is one that has been 
available since just before DSpace 1.4.  It was originally created by 
Andrea Bollini:
http://sourceforge.net/tracker/index.php?func=detail&aid=1373613&group_id=19984&atid=319984

However, since that patch is now very *out of date*, I'm currently 
working on an updated version specifically for the DSpace 1.5 XMLUI. 
I'll be posting it as soon as it is stable/complete for others to use 
(hopefully by early-to-mid Sept).  At this time, I'm not planning on 
implementing it for the DSpace 1.5 JSPUI, as that would be additional 
work and U of Illinois isn't planning to use the JSPUI any longer.

I don't anticipate my patch making it into a DSpace out-of-the-box 
release, as I'm hoping that the DSpace 2.0 work will implement this 
functionality in a much more complete manner.

Let me know if you have any more questions on this...

- Tim


-- 
Tim Donohue
Research Programmer, Illinois Digital Environment for
Access to Learning and Scholarship (IDEALS)
University of Illinois at Urbana-Champaign
tdonohue at illinois.edu | (217) 333-4648


From Christina.Richison at nitle.org  Tue Aug 19 11:34:05 2008
From: Christina.Richison at nitle.org (Christina Richison)
Date: Tue, 19 Aug 2008 11:34:05 -0400
Subject: [Dspace-general] What's working and what isn't?
In-Reply-To: <mailman.9607.1219153851.4453.dspace-general@mit.edu>
Message-ID: <08119B28F4B3FF46A5747921C9FAA678D26839@AA1EXCH06.office.share.org>

DSpace Community, 

 
To address Dorothea's excellent question, I have the following to offer:

 
What is working?

1.      I like being able to add thumbnails through the jpeg media
filter. A little something that makes life easier.

2.      I like the fuzzy search feature. More information can be found
here: 
http://lucene.apache.org/java/docs/queryparsersyntax.html#Fuzzy%20Search
es 

3.      I like the DSpace hierarchy and the option of creating
sub-communities within sub-communities.

 
What isn't working?

1.      Moving Communities, Collections, and Items: It would be nice to
drag and drop these components into new homes instead of going through
an export/import process. An example, it makes more sense to my
department to move Collection B from Subcommunity A into Subcommunity B.
I don't want to go through the export/import process with the
appropriate XML file structure to accomplish this task. 

2.      Default Naming of Groups: It is easy to get "lost" when working
with Authorization Groups. For example, I don't remember what collection
327 is. The following screen shots displays some groups for which I am
an authorizing member. Now what exactly are they? The default should
help clarify this not make me work to clarify.

 
All the best,

 
Christina Richison

NITLE, NIS Technical Services Specialist

christina.richison at nitle.org

 
-----------------

 
Today's Topics:

 
   1. Re: Question one: What's working and what isn't? (Dorothea Salo)

   2. Survey: Catalogers working with Non-MARC Metadata (Veve, Marielle)

   3. Re: Question one: What's working and what isn't? (Robin Taylor)

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/cd39448e/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 20391 bytes
Desc: image002.jpg
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/cd39448e/attachment.jpg

From dsalo at library.wisc.edu  Wed Aug 20 11:49:43 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Wed, 20 Aug 2008 10:49:43 -0500
Subject: [Dspace-general] Chat summary: 20 August 2008
Message-ID: <356cf3980808200849y5d206b71j3c607e763a489761@mail.gmail.com>

We had about twenty people (and, unfortunately, two or three trolls)
in this morning's chat! That's a much larger turnout than I expected,
and I find it very encouraging.

After a round of introductions, we talked about the following things:

BULK METADATA EDITING
Use cases included name and subject authority control, adding a new
piece of metadata to all the items in a collection at once.
One manager wanted to allow student assistants to bulk-edit metadata.
Suggestion: export/import of a collection's metadata only, for batch editing

PERMISSIONS
One manager said that permissions were opaquely-named and difficult to
understand, making it hard to determine exactly what permissions a
given eperson has.
Desiderata included letting epeople other than administrators create
collections, automatically changing edit permissions on existing items
in a collection when a new collection administrator is added, and
letting collection administrators edit/change bitstreams (use case:
ETDs with last-minute corrections).
Suggestion: instead of recording permissions on each individual item,
check against collection administrator list for edit rights on the
item.

DOCUMENTATION
Several people mentioned using the wiki, especially the how-to pages.
It was noted that the how-to pages are becoming disorganized and
unwieldy, which will only get worse as more are added.
Suggestions: organize the how-to pages by version of DSpace to which
they apply; organize the how-to pages by task ("Install" "Customize"
"Administer" "Troubleshoot" "Internationalize" etc).
The mailing lists are helpful, but good information becomes the
"needle in the haystack" -- hard to search for, especially with the
unfriendly SourceForge interface. Several managers archive useful
messages for later use.
Suggestions: Build a way to auto-forward useful messages from
dspace-tech to the wiki, for editing by one or more community members.
Reuse material from an upcoming course on administering DSpace.
Develop a "new user guide." Reorganize the DSpace feature list by
common perceived needs rather than by feature.
Dealing with problems in the underlying technology stack rather than
DSpace itself can be difficult, as can finding live help.
Suggestions: advertize the DSpace IRC room, arrange "office hours" there.

OTHER DESIDERATA
- embargoes (two managers reported using an embargo hack; one is
delaying an upgrade to 1.5 because it does not have one)
- multilingual issues: community/collection descriptions in more than
one language, metadata input in more than one language

LOGISTICS
The IRC chatroom (irc.freenode.net, #dspace) is an underused resource!
Developers and admins watch the room who are happy to help with DSpace
issues. To broaden awareness of this helpful space, chats will be held
there going forward. Next week's agenda should include discussion of
DSpace statistics.

Thanks to all participants!

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From Claudia.Juergen at ub.uni-dortmund.de  Wed Aug 20 13:06:08 2008
From: Claudia.Juergen at ub.uni-dortmund.de (Claudia Juergen)
Date: Wed, 20 Aug 2008 19:06:08 +0200 (CEST)
Subject: [Dspace-general] What's working and what isn't?
In-Reply-To: <08119B28F4B3FF46A5747921C9FAA678D26839@AA1EXCH06.office.share.org>
References: <08119B28F4B3FF46A5747921C9FAA678D26839@AA1EXCH06.office.share.org>
Message-ID: <32cc7398780c7664581bf0d0e48fe303.squirrel@mail.ub.uni-dortmund.de>

Hi Christina,


It is meanwhile possible to move at items via the DSpace UI.

There is a patch for moving collections in the patch queue, but I haven't
tried it yet. If you do, it would be great to have some feedback.

As for authorizations, in standard use cases you do not need to know the
default authorization group names i.e. while creating or editing a dspace
object.

The best practice with regards to authorizations is to use your own
groups, even if they just consist of 0-1 member at the beginning, i.e.
while setting up new structures using standard groups. Thus e.g. staff
changes are little trouble.

Sunny greetings

Claudia


> DSpace Community,
>
>
>
> To address Dorothea's excellent question, I have the following to offer:
>
>
>
> What is working?
>
> 1.      I like being able to add thumbnails through the jpeg media
> filter. A little something that makes life easier.
>
> 2.      I like the fuzzy search feature. More information can be found
> here:
> http://lucene.apache.org/java/docs/queryparsersyntax.html#Fuzzy%20Search
> es
>
> 3.      I like the DSpace hierarchy and the option of creating
> sub-communities within sub-communities.
>
>
>
> What isn't working?
>
> 1.      Moving Communities, Collections, and Items: It would be nice to
> drag and drop these components into new homes instead of going through
> an export/import process. An example, it makes more sense to my
> department to move Collection B from Subcommunity A into Subcommunity B.
> I don't want to go through the export/import process with the
> appropriate XML file structure to accomplish this task.
>
> 2.      Default Naming of Groups: It is easy to get "lost" when working
> with Authorization Groups. For example, I don't remember what collection
> 327 is. The following screen shots displays some groups for which I am
> an authorizing member. Now what exactly are they? The default should
> help clarify this not make me work to clarify.
>
>
>
>
>
> All the best,
>
>
>
> Christina Richison
>
> NITLE, NIS Technical Services Specialist
>
> christina.richison at nitle.org
>
>
>
> -----------------
>
>
>
>
>
> Today's Topics:
>
>
>
>    1. Re: Question one: What's working and what isn't? (Dorothea Salo)
>
>    2. Survey: Catalogers working with Non-MARC Metadata (Veve, Marielle)
>
>    3. Re: Question one: What's working and what isn't? (Robin Taylor)
>
>
>
>
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general
>


From mdiggory at MIT.EDU  Wed Aug 20 15:11:12 2008
From: mdiggory at MIT.EDU (Mark Diggory)
Date: Wed, 20 Aug 2008 12:11:12 -0700
Subject: [Dspace-general] [Dspace-tech] Question one: What's working and
	what isn't?
In-Reply-To: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com>
References: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com>
Message-ID: <7DD5CC2A-7223-4715-91C4-0AD129AD9525@mit.edu>


On Aug 18, 2008, at 6:24 AM, Dorothea Salo wrote:

> Housekeeping:
>
> - Please respond to the dspace-general list, or to me directly.
> DSpace-tech has a 1.5.1 beta to talk about, and I don't want to derail
> that very important conversation!

Release discussions generally occur on dspace-devel and dspace-commit  
lists (though infrequently). I would recommend not separating the IR  
Manager user group out of the user community.

I've been generally dissatisfied with the breakup of the community  
over the lists of dspace-general at mit, dspace-tech at sf and dspace- 
devel at sf.  I've recommended in the past, a consolidation of or  
restructuring of this list setup.  By breaking off even more avenues  
for discussion, it creates an even great state of chaos and localized  
discussion that is difficult to keep track of.

In the past I recommended moving dspace-general to the SF site to  
assure that its is clearly identified with the DSpace foundation and  
community rather than MIT Libraries.

In the past I've also recommened renaming the lists to clarify the  
standard defacto OS listserv roles that they should be playing in the  
community

dspace-general at mit -->  consolidate into below
dspace-tech at sf          -->  dspace-user at sf (or possibly dspace- 
community)
dspace-devel at sf       -->   dspace-devel at sf
dspace-commit at sf    -->   dspace-admin at sf

I also recommend an additional read only list.

dspace-announce at sf

For which folks only interested in official dspace foundation/ 
community announcements and not other discussions happening above.

I am concerned that the spawning off of new discussion/chat/email  
lists is undermining the communities ability to maintain a  
centralized and clearly transparent mechanism for communication.

-Mark


~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage


From mdiggory at MIT.EDU  Wed Aug 20 15:12:24 2008
From: mdiggory at MIT.EDU (Mark Diggory)
Date: Wed, 20 Aug 2008 12:12:24 -0700
Subject: [Dspace-general] [Dspace-tech] Chat summary: 20 August 2008
In-Reply-To: <356cf3980808200849y5d206b71j3c607e763a489761@mail.gmail.com>
References: <356cf3980808200849y5d206b71j3c607e763a489761@mail.gmail.com>
Message-ID: <3A5F663C-5468-44BD-8DC3-EBB3C8C3D476@mit.edu>

The timing of this meeting was a bit off in my time zone, 0700 am  
(and is why I usefully prefer to use email for communication within  
the community as it is asynchronous).  Is there a transparent log of  
the chat conversation, I've logged into Meebo, but only found a  
portion of the history there...

Its pleasant to see a round table get together and talk about such  
issues, I hope it will fuel an activity to get a better needs  
assessment out of the IR Manager user group.  I would highly  
recommend formalizing the history of the event by summarizing this in  
the WIKI (as you point out a need to do when such events occur in  
chats and email lists). I also recommend someone should act as a  
moderator/secretary right now to aggregate your further list  
discussion into a more formal state in the WIKI in as real a time as  
possible.  It would be best to have the notes and chat log that  
you've presented here into a section of the WIKI focused wholly on  
the interests of the IR Managers group that is forming here.  This  
could even simply be links to the pertinent threads of interest in  
the S.F. and dspace-general email lists.

Cheers,
Mark

On Aug 20, 2008, at 8:49 AM, Dorothea Salo wrote:

> We had about twenty people (and, unfortunately, two or three trolls)
> in this morning's chat! That's a much larger turnout than I expected,
> and I find it very encouraging.
>
> After a round of introductions, we talked about the following things:
>
> BULK METADATA EDITING
> Use cases included name and subject authority control, adding a new
> piece of metadata to all the items in a collection at once.
> One manager wanted to allow student assistants to bulk-edit metadata.
> Suggestion: export/import of a collection's metadata only, for  
> batch editing
>
> PERMISSIONS
> One manager said that permissions were opaquely-named and difficult to
> understand, making it hard to determine exactly what permissions a
> given eperson has.
> Desiderata included letting epeople other than administrators create
> collections, automatically changing edit permissions on existing items
> in a collection when a new collection administrator is added, and
> letting collection administrators edit/change bitstreams (use case:
> ETDs with last-minute corrections).
> Suggestion: instead of recording permissions on each individual item,
> check against collection administrator list for edit rights on the
> item.
>
> DOCUMENTATION
> Several people mentioned using the wiki, especially the how-to pages.
> It was noted that the how-to pages are becoming disorganized and
> unwieldy, which will only get worse as more are added.
> Suggestions: organize the how-to pages by version of DSpace to which
> they apply; organize the how-to pages by task ("Install" "Customize"
> "Administer" "Troubleshoot" "Internationalize" etc).
> The mailing lists are helpful, but good information becomes the
> "needle in the haystack" -- hard to search for, especially with the
> unfriendly SourceForge interface. Several managers archive useful
> messages for later use.
> Suggestions: Build a way to auto-forward useful messages from
> dspace-tech to the wiki, for editing by one or more community members.
> Reuse material from an upcoming course on administering DSpace.
> Develop a "new user guide." Reorganize the DSpace feature list by
> common perceived needs rather than by feature.
> Dealing with problems in the underlying technology stack rather than
> DSpace itself can be difficult, as can finding live help.
> Suggestions: advertize the DSpace IRC room, arrange "office hours"  
> there.
>
> OTHER DESIDERATA
> - embargoes (two managers reported using an embargo hack; one is
> delaying an upgrade to 1.5 because it does not have one)
> - multilingual issues: community/collection descriptions in more than
> one language, metadata input in more than one language
>
> LOGISTICS
> The IRC chatroom (irc.freenode.net, #dspace) is an underused resource!
> Developers and admins watch the room who are happy to help with DSpace
> issues. To broaden awareness of this helpful space, chats will be held
> there going forward. Next week's agenda should include discussion of
> DSpace statistics.
>
> Thanks to all participants!
>
> Dorothea
>
> -- 
> Dorothea Salo dsalo at library.wisc.edu
> Digital Repository Librarian AIM: mindsatuw
> University of Wisconsin
> Rm 218, Memorial Library
> (608) 262-5493
>
> ---------------------------------------------------------------------- 
> ---
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


From mwood at IUPUI.Edu  Wed Aug 20 15:59:12 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Wed, 20 Aug 2008 15:59:12 -0400
Subject: [Dspace-general] [Dspace-tech] Question one: What's working	and
	what isn't?
In-Reply-To: <7DD5CC2A-7223-4715-91C4-0AD129AD9525@mit.edu>
References: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com>
	<7DD5CC2A-7223-4715-91C4-0AD129AD9525@mit.edu>
Message-ID: <20080820195912.GB24603@IUPUI.Edu>

On Wed, Aug 20, 2008 at 12:11:12PM -0700, Mark Diggory wrote:
> In the past I recommended moving dspace-general to the SF site to  
> assure that its is clearly identified with the DSpace foundation and  
> community rather than MIT Libraries.

A data point:  I had forgotten that dspace-general even existed, since
it wasn't visible at SF, until it was recently mentioned on
dspace-devel.  Housing all of the lists together sounds good to me.
(OTOH the SF list archive navigation tools are awful!)

> dspace-commit at sf    -->   dspace-admin at sf

Um, dspace-admin sounds like "for discussion of administration of
DSpace installations".  That's certainly not what a commit list is
for.  ???
 
> I am concerned that the spawning off of new discussion/chat/email  
> lists is undermining the communities ability to maintain a  
> centralized and clearly transparent mechanism for communication.

A particular problem with chats is that there is no record of any
progress made there unless someone is logging.  May I suggest that, in
any chat, when consensus or other significant progress is reached,
there be a call for a volunteer to write up a summary thereof and post
it to a mailing list or the wiki, so that it can be referred back to
later or discovered by those not present in the chat.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080820/9fc05be4/attachment.bin

From dsalo at library.wisc.edu  Fri Aug 22 10:34:29 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Fri, 22 Aug 2008 09:34:29 -0500
Subject: [Dspace-general] Summary: Week 1 responses
Message-ID: <356cf3980808220734q63ad1fa6yeab61c1f463ec6f5@mail.gmail.com>

(The chat summary is up on the wiki at
<http://wiki.dspace.org/index.php/Community_Requirements_Gathering_Chat_20_August_2008>;
participants, feel free to edit! This summary will go up on the wiki
also, and will likewise be editable.)

I received six off-list responses, alongside three onlist ones
(including my own).

LIKES

- Simple to get running, a lot of bang for the buck (3 mentions)
- Checksum checker (3 mentions)
- HTML display engine (3 mentions)
- Search engine: "simple and fast" (2 mentions)
- Manakin theming (2 mentions)
- Legible displays, easy structuring of data and depositors into
communities/collections
- Flexible Dublin Core, easy defining and rebuilding of indexes
- Storage in SQL database
- Easy to showcase and share data
- Community spirit and visionary dedication!
- Configurable submission workflow, metadata templating
- Maven dealing with Java dependencies

NEEDS/ISSUES

- Complexity of authorization/permissions system, poor fit with
real-world workflows, too much work for DSpace admins that can't be
delegated (5 mentions)
- Communities/collections model confusing and unintuitive for
end-users (3 mentions)
- Submission process needs streamlining and simplification;
input-forms.xml needs to be end-user editable (3 mentions)
- Allow bitstream updating/addition after deposit by users and
collection administrators (me, George)
- Doesn't automatically feed people into user groups based on LDAP
group membership
- Allow depositors to withdraw their own items
- Difficult to customize; also, moving to Manakin costs functionality
- Documentation scattered and confusing
- Can only use Postgres and Oracle databases
- Too-close integration with handles

DESIRED NEW FUNCTIONALITY

I tried to exclude this from the question, but I got a lot of it anyway!

- better i18n (community/collection descriptions, metadata-input
forms) (3 mentions)
- statistics (per item, per author) (3 mentions)
- batch import of citation-only references from a single document
- web UI to prompt a reindexing
- add "also by" (author) or "see also" (subject) links to item pages
- persistent bitstream handles/URLs
- HTML in metadata (e.g. abstracts)
- embargo support
- bulk metadata editing through a web UI


The floor is open for discussion! Devs, please feel free to ask for
clarification, which I hope participants will provide. Participants,
if I have traduced your input, please do say so.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From dsalo at library.wisc.edu  Mon Aug 25 09:08:47 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Mon, 25 Aug 2008 08:08:47 -0500
Subject: [Dspace-general] Week 2: Statistics
Message-ID: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>

Greetings, DSpace community,

I want to thank everyone once again for last week's stimulating
discussion and impressive chat turnout! I have a new question for
everyone this week, pursuant to some discussion on the lists:

"Statistics" are one of the commonest requests for a new DSpace
feature. Without further specification, however, it's hard to know
what data to present, since there are no standards or even clear best
practices in this area. What statistics do the following groups of
DSpace users need to see, and in what form are the statistics best
presented to them?

Depositors
End-users (defined as "people examining items and downloading
bitstreams from a DSpace instance;" we may have to refine this further
in discussion)
DSpace repository managers (as distinct from systems administrators)

What else should developers keep in mind as they implement this feature?

Because it would be nice to reach a working consensus on this (unlike
last week's question, which was intended to pull out as broad a
selection of needs as possible), I think we should start discussing
immediately. I encourage all respondents to respond TO THE MAILING
LIST instead of to me.

I will be holding another chat to discuss the weekly question. It will
take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on
irc.freenode.net. I apologize to West Coast (USA) community members
for last week's unconscionably early hour; we'll try 10 am US Central
(11 am Eastern, 4 pm GMT) this week, and we may go even later next
week if our European community members can stand it.

For those who don't normally use IRC, there are two easy web gateways.
One is mibbit.com; the other is specific to our channel and can be
found at <http://dspace.testathon.net/cgi-bin/irc.cgi>. I encourage
all of us to become familiar with the channel; it is a source of
real-time technical information from DSpace developers, as well as a
community in its own right.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From dsalo at library.wisc.edu  Mon Aug 25 10:07:43 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Mon, 25 Aug 2008 09:07:43 -0500
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
Message-ID: <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>

My answers:

> What statistics do the following groups of
> DSpace users need to see, and in what form are the statistics best
> presented to them?
>
> Depositors

At a minimum, I would like depositors to see the number of times an
item's splash page has been visited, and the number of times each
content bitstream (as distinct from e.g. thumbnails) has been
downloaded. I would also like aggregate statistics available for each
author in the system, though I recognize that this creates
authority-control and role-evaluation issues. (For example, if Dr.
Helen Troia is the author of articles in the repository, the editor of
a journal whose backfiles are in the repository, as well as a thesis
advisor for some theses in the thesis collection, the journal and the
theses should NOT count toward her downloads.)

HTML items (and similar aggregates, once we can work with them; e.g.
Flash objects) cause trouble for bitstream analysis. To cut through
the jungle, I suggest that only the primary bitstream have its
accesses counted. If possible, it would be nice to count accesses for
all HTML bitstreams, but that can be lived without if need be.

I don't believe these statistics need to be real-time; a daily or even
weekly cron-job would suffice. I do believe we need to take into
account when an item was ingested, recognizing that older items will
pile up the downloads over time. In addition to total-aggregates,
then, I would recommend "in the last week," "in the last month," and
"in the last year/since ingest" information. Per-calendar-year
information should be kept and displayed indefinitely, even if the
underlying data are eventually purged, because authors will use this
in tenure-and-promotion packages. A sense of delta would be nice as
well -- depositors would LOVE to know if suddenly an item's downloads
spike.

Other desiderata, less important: broad-brush geographic information
(country of origin? Google Maps mashup?) for accesses, per-collection
and per-community access counts (because it NEVER hurts to get a sense
of competition going), search terms (in DSpace itself or from search
engines) that land people at a particular item.

> End-users (defined as "people examining items and downloading
> bitstreams from a DSpace instance;" we may have to refine this further
> in discussion)

I think end-users can usefully be shown the per-item and per-bitstream
information discussed above. They don't need to see per-author
information -- or at the very least, authors should be able to decide
whether to make this information public. (We do NOT want to embarrass
anyone; that's a serious turnoff for our potential depositors.)

> DSpace repository managers (as distinct from systems administrators)

I get survey after survey asking for activity information on the
repository. I can't answer them. To do so, I need download information
on the whole repository. (Current JSPUI statistics offer an
approximation to this, but I'm very leery of trusting it; I don't
understand how it's calculated, and the numbers seem incredibly off to
me.) I am sometimes asked about growth rate in accesses, so it would
be useful to break this down by year. Some algorithm for breaking it
down by amount of content in the repository ("downloads-per-item,"
where "item" would have to be some kind of average of
items-in-repository over the period examined) would be useful as well.

(And yes, I absolutely loathe those surveys too, but when they come
from ARL, I don't have the luxury of ignoring them.)

Some "wow" numbers would be useful for marketing purposes. A lot of
what I've already described would do the trick there.

I would also like to be able to track deposits per
collection/community over time; this helps me know where to focus
marketing and collection-development efforts, as well as helping me
report progress to the appropriate administrators. (I run a
system-wide repository, so I have to track deposits by campus; each
campus has its own community.)

> What else should developers keep in mind as they implement this feature?

Search-engine crawlers. Excluding them provides a much more realistic
sense of interest. We need to make clear this is happening, though, or
we will be at a perceived disadvantage relative to repositories that
don't strip out these accesses.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From mwood at IUPUI.Edu  Mon Aug 25 10:55:20 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Mon, 25 Aug 2008 10:55:20 -0400
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
Message-ID: <20080825145520.GF15124@IUPUI.Edu>

One thing to keep in mind about whole-site statistical tables is that
there are already tools to do this for web sites in general, such as
AWStats or Webalizer or whatever your favorite may be.  We probably
should not spend effort to try to duplicate those.

Another consideration is that there are stat.s which would be useful
anytime, and stat.s that you dream up once and may never use again, or
may only find interesting at irregular intervals.  So I think we
should be careful not to try to do too much ourselves.  We can have
some generally-useful stuff built in, but we also need ways to expose
the raw cases in a useful form for ad-hoc analysis with
general-purpose statistical tools (SPSS/BMD/SAS/Stata/R/whatever).

Stuff to be inserted as one component of e.g. an item page probably
needs to be built in.  Stuff that would be a page on its own should
perhaps not be part of DSpace at all, but rather something we make
easy to do with other tools.

We need to keep clearly in mind the distinction between capturing raw
cases (someone fetched a bitstream) and abstracting useful patterns
from the collected cases (frequency histogram of this collection's
fetches over time, last month's fetches broken down by nation of
origin).

What might be helpful is to provide some views or stored procedures
that stat. tools could use to classify observations.  Such tools
usually have good facilities for poking around in databases, but could
perhaps use help in getting the information they need without having to
understand (and track changes to!) the fulness of DSpace's schema.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080825/1477891f/attachment.bin

From l.hayes at auckland.ac.nz  Mon Aug 25 18:50:18 2008
From: l.hayes at auckland.ac.nz (Leonie Hayes)
Date: Tue, 26 Aug 2008 10:50:18 +1200
Subject: [Dspace-general] Statistics
In-Reply-To: <mailman.327.1219680198.17492.dspace-general@mit.edu>
References: <mailman.327.1219680198.17492.dspace-general@mit.edu>
Message-ID: <A30994AC067CB646AAD4698C727A1A0204210E1B@libex1.lbr.auckland.ac.nz>

Dear DSpace Community

Statistics

1. From a what works perspective there is already beautiful statistics
implementations addressing the minimum requirements, I think the IDEALS
repository has what I would be very happy with, these guys seem to be
one step ahead http://www.ideals.uiuc.edu I can remember asking Tim
Donohue about their implementation a few years ago, he said it was a
very customised solution, please correct me if wrong. I also find the
eprints and Fez Fedora stats are pretty good.

2. Develop a package that delivers both via the JSP and XML Manakin
interface.

3. Keep it fairly compartmentalised/simple? if possible and quarantine
the requirements into 3 distinct areas
a) Item Statistics - downloads with other additional extras like authors
and collections 
b) Site Trends - traffic sources, countries etc piggy back on tools like
Google Analytics, or other web analyser tools that Mark Wood mentions 
c) More complex reporting that meets a specific requirements.

Many thanks for the opportunity to be part of the discussion, we are
very isolated in New Zealand but struggling with all the same problems
everyone else is experiencing... it helps to move forward. Time zones
don't allow any online interaction it will be 4am here.


Leonie Hayes
Research Repository Librarian
http://www.library.auckland.ac.nz/contacts/?firstname=&lastname=hayes
http://researchspace.auckland.ac.nz  
 

-----Original Message-----
From: dspace-general-bounces at mit.edu
[mailto:dspace-general-bounces at mit.edu] On Behalf Of
dspace-general-request at mit.edu
Sent: Tuesday, 26 August 2008 4:03 a.m.
To: dspace-general at mit.edu
Subject: Dspace-general Digest, Vol 61, Issue 19

Send Dspace-general mailing list submissions to
	dspace-general at mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.mit.edu/mailman/listinfo/dspace-general
or, via email, send a message with subject or body 'help' to
	dspace-general-request at mit.edu

You can reach the person managing the list at
	dspace-general-owner at mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Dspace-general digest..."


Today's Topics:

   1. Week 2: Statistics (Dorothea Salo)
   2. Re: Week 2: Statistics (Dorothea Salo)
   3. Re: Week 2: Statistics (Mark H. Wood)


----------------------------------------------------------------------

Message: 1
Date: Mon, 25 Aug 2008 08:08:47 -0500
From: "Dorothea Salo" <dsalo at library.wisc.edu>
Subject: [Dspace-general] Week 2: Statistics
To: dspace <dspace-general at mit.edu>,	"DSpace Tech-List"
	<DSpace-tech at lists.sourceforge.net>
Message-ID:
	<356cf3980808250608t689c84d8uc7d7f69155a76ece at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Greetings, DSpace community,

I want to thank everyone once again for last week's stimulating
discussion and impressive chat turnout! I have a new question for
everyone this week, pursuant to some discussion on the lists:

"Statistics" are one of the commonest requests for a new DSpace
feature. Without further specification, however, it's hard to know
what data to present, since there are no standards or even clear best
practices in this area. What statistics do the following groups of
DSpace users need to see, and in what form are the statistics best
presented to them?

Depositors
End-users (defined as "people examining items and downloading
bitstreams from a DSpace instance;" we may have to refine this further
in discussion)
DSpace repository managers (as distinct from systems administrators)

What else should developers keep in mind as they implement this feature?

Because it would be nice to reach a working consensus on this (unlike
last week's question, which was intended to pull out as broad a
selection of needs as possible), I think we should start discussing
immediately. I encourage all respondents to respond TO THE MAILING
LIST instead of to me.

I will be holding another chat to discuss the weekly question. It will
take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on
irc.freenode.net. I apologize to West Coast (USA) community members
for last week's unconscionably early hour; we'll try 10 am US Central
(11 am Eastern, 4 pm GMT) this week, and we may go even later next
week if our European community members can stand it.

For those who don't normally use IRC, there are two easy web gateways.
One is mibbit.com; the other is specific to our channel and can be
found at <http://dspace.testathon.net/cgi-bin/irc.cgi>. I encourage
all of us to become familiar with the channel; it is a source of
real-time technical information from DSpace developers, as well as a
community in its own right.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


------------------------------

Message: 2
Date: Mon, 25 Aug 2008 09:07:43 -0500
From: "Dorothea Salo" <dsalo at library.wisc.edu>
Subject: Re: [Dspace-general] Week 2: Statistics
To: dspace <dspace-general at mit.edu>
Message-ID:
	<356cf3980808250707n5d45ec1vbd607ddcac148e27 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

My answers:

> What statistics do the following groups of
> DSpace users need to see, and in what form are the statistics best
> presented to them?
>
> Depositors

At a minimum, I would like depositors to see the number of times an
item's splash page has been visited, and the number of times each
content bitstream (as distinct from e.g. thumbnails) has been
downloaded. I would also like aggregate statistics available for each
author in the system, though I recognize that this creates
authority-control and role-evaluation issues. (For example, if Dr.
Helen Troia is the author of articles in the repository, the editor of
a journal whose backfiles are in the repository, as well as a thesis
advisor for some theses in the thesis collection, the journal and the
theses should NOT count toward her downloads.)

HTML items (and similar aggregates, once we can work with them; e.g.
Flash objects) cause trouble for bitstream analysis. To cut through
the jungle, I suggest that only the primary bitstream have its
accesses counted. If possible, it would be nice to count accesses for
all HTML bitstreams, but that can be lived without if need be.

I don't believe these statistics need to be real-time; a daily or even
weekly cron-job would suffice. I do believe we need to take into
account when an item was ingested, recognizing that older items will
pile up the downloads over time. In addition to total-aggregates,
then, I would recommend "in the last week," "in the last month," and
"in the last year/since ingest" information. Per-calendar-year
information should be kept and displayed indefinitely, even if the
underlying data are eventually purged, because authors will use this
in tenure-and-promotion packages. A sense of delta would be nice as
well -- depositors would LOVE to know if suddenly an item's downloads
spike.

Other desiderata, less important: broad-brush geographic information
(country of origin? Google Maps mashup?) for accesses, per-collection
and per-community access counts (because it NEVER hurts to get a sense
of competition going), search terms (in DSpace itself or from search
engines) that land people at a particular item.

> End-users (defined as "people examining items and downloading
> bitstreams from a DSpace instance;" we may have to refine this further
> in discussion)

I think end-users can usefully be shown the per-item and per-bitstream
information discussed above. They don't need to see per-author
information -- or at the very least, authors should be able to decide
whether to make this information public. (We do NOT want to embarrass
anyone; that's a serious turnoff for our potential depositors.)

> DSpace repository managers (as distinct from systems administrators)

I get survey after survey asking for activity information on the
repository. I can't answer them. To do so, I need download information
on the whole repository. (Current JSPUI statistics offer an
approximation to this, but I'm very leery of trusting it; I don't
understand how it's calculated, and the numbers seem incredibly off to
me.) I am sometimes asked about growth rate in accesses, so it would
be useful to break this down by year. Some algorithm for breaking it
down by amount of content in the repository ("downloads-per-item,"
where "item" would have to be some kind of average of
items-in-repository over the period examined) would be useful as well.

(And yes, I absolutely loathe those surveys too, but when they come
from ARL, I don't have the luxury of ignoring them.)

Some "wow" numbers would be useful for marketing purposes. A lot of
what I've already described would do the trick there.

I would also like to be able to track deposits per
collection/community over time; this helps me know where to focus
marketing and collection-development efforts, as well as helping me
report progress to the appropriate administrators. (I run a
system-wide repository, so I have to track deposits by campus; each
campus has its own community.)

> What else should developers keep in mind as they implement this
feature?

Search-engine crawlers. Excluding them provides a much more realistic
sense of interest. We need to make clear this is happening, though, or
we will be at a perceived disadvantage relative to repositories that
don't strip out these accesses.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


------------------------------

Message: 3
Date: Mon, 25 Aug 2008 10:55:20 -0400
From: "Mark H. Wood" <mwood at IUPUI.Edu>
Subject: Re: [Dspace-general] Week 2: Statistics
To: dspace-general at mit.edu
Message-ID: <20080825145520.GF15124 at IUPUI.Edu>
Content-Type: text/plain; charset="us-ascii"

One thing to keep in mind about whole-site statistical tables is that
there are already tools to do this for web sites in general, such as
AWStats or Webalizer or whatever your favorite may be.  We probably
should not spend effort to try to duplicate those.

Another consideration is that there are stat.s which would be useful
anytime, and stat.s that you dream up once and may never use again, or
may only find interesting at irregular intervals.  So I think we
should be careful not to try to do too much ourselves.  We can have
some generally-useful stuff built in, but we also need ways to expose
the raw cases in a useful form for ad-hoc analysis with
general-purpose statistical tools (SPSS/BMD/SAS/Stata/R/whatever).

Stuff to be inserted as one component of e.g. an item page probably
needs to be built in.  Stuff that would be a page on its own should
perhaps not be part of DSpace at all, but rather something we make
easy to do with other tools.

We need to keep clearly in mind the distinction between capturing raw
cases (someone fetched a bitstream) and abstracting useful patterns
from the collected cases (frequency histogram of this collection's
fetches over time, last month's fetches broken down by nation of
origin).

What might be helpful is to provide some views or stored procedures
that stat. tools could use to classify observations.  Such tools
usually have good facilities for poking around in databases, but could
perhaps use help in getting the information they need without having to
understand (and track changes to!) the fulness of DSpace's schema.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url :
http://mailman.mit.edu/pipermail/dspace-general/attachments/20080825/147
7891f/attachment-0001.bin

------------------------------

_______________________________________________
Dspace-general mailing list
Dspace-general at mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general


End of Dspace-general Digest, Vol 61, Issue 19
**********************************************


From bram at mire.be  Mon Aug 25 19:23:39 2008
From: bram at mire.be (Bram Luyten)
Date: Tue, 26 Aug 2008 01:23:39 +0200
Subject: [Dspace-general] [Dspace-tech] Week 2: Statistics
In-Reply-To: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
Message-ID: <ad9d20200808251623j704622ebg1e4eb9afa128ada7@mail.gmail.com>

Dear Dorothea,

inspiring question ! There's a huge range of interesting options to explore
in the area of statistics, measurement and repository usage tracking.

Following ideas could be relevant to end-users:

It might be interesting for authors if they are able to see which dspace (or
google) search queries, lead to his items. This could be displayed as the
top ten of most popular searches, that lead to a specific item.

If download per bitstream, and splash page visits are being tracked, it
might be useful if they could be used to display different rankings//lists

Rankings by collection or community & the possibility to locate a certain
item in those rankings.
Use Case: if you can see that your item is one of the best "performing"
items in your collection, you might be interested how it performs in the
context of the above lying community.

Order by hits or downloads, for "items-by-author"
Use Case: if you found an interesting author, with a lot of papers relevant
in your context, you might want to start off with his most popular items.

with kindest regards,

Bram Luyten

On Mon, Aug 25, 2008 at 3:08 PM, Dorothea Salo <dsalo at library.wisc.edu>wrote:

> Greetings, DSpace community,
>
> I want to thank everyone once again for last week's stimulating
> discussion and impressive chat turnout! I have a new question for
> everyone this week, pursuant to some discussion on the lists:
>
> "Statistics" are one of the commonest requests for a new DSpace
> feature. Without further specification, however, it's hard to know
> what data to present, since there are no standards or even clear best
> practices in this area. What statistics do the following groups of
> DSpace users need to see, and in what form are the statistics best
> presented to them?
>
> Depositors
> End-users (defined as "people examining items and downloading
> bitstreams from a DSpace instance;" we may have to refine this further
> in discussion)
> DSpace repository managers (as distinct from systems administrators)
>
> What else should developers keep in mind as they implement this feature?
>
> Because it would be nice to reach a working consensus on this (unlike
> last week's question, which was intended to pull out as broad a
> selection of needs as possible), I think we should start discussing
> immediately. I encourage all respondents to respond TO THE MAILING
> LIST instead of to me.
>
> I will be holding another chat to discuss the weekly question. It will
> take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on
> irc.freenode.net. I apologize to West Coast (USA) community members
> for last week's unconscionably early hour; we'll try 10 am US Central
> (11 am Eastern, 4 pm GMT) this week, and we may go even later next
> week if our European community members can stand it.
>
> For those who don't normally use IRC, there are two easy web gateways.
> One is mibbit.com; the other is specific to our channel and can be
> found at <http://dspace.testathon.net/cgi-bin/irc.cgi>. I encourage
> all of us to become familiar with the channel; it is a source of
> real-time technical information from DSpace developers, as well as a
> community in its own right.
>
> Dorothea
>
> --
> Dorothea Salo dsalo at library.wisc.edu
> Digital Repository Librarian AIM: mindsatuw
> University of Wisconsin
> Rm 218, Memorial Library
> (608) 262-5493
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>


-- 
@mire NV
Romeinse Straat 18
3001 Heverlee
Belgium
+32 2 888 29 56

http://www.atmire.com - Institutional Repository Solutions
http://www.togather.eu - Before getting together, get Tog at ther
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/877dc355/attachment.htm

From dsalo at library.wisc.edu  Tue Aug 26 10:44:45 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Tue, 26 Aug 2008 09:44:45 -0500
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <20080825145520.GF15124@IUPUI.Edu>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
	<20080825145520.GF15124@IUPUI.Edu>
Message-ID: <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>

2008/8/25 Mark H. Wood <mwood at iupui.edu>:
> One thing to keep in mind about whole-site statistical tables is that
> there are already tools to do this for web sites in general, such as
> AWStats or Webalizer or whatever your favorite may be.  We probably
> should not spend effort to try to duplicate those.

Perhaps not, but if this is the direction we want people to go in, we
probably ought to document how to do it, at least informally on the
wiki. Does anybody have such a system in place?

> We can have
> some generally-useful stuff built in, but we also need ways to expose
> the raw cases in a useful form for ad-hoc analysis with
> general-purpose statistical tools (SPSS/BMD/SAS/Stata/R/whatever).

+1 Data for mashup is always good. I should have mentioned that a
desire I have is the ability to export/transclude at LEAST by-author
stats data for inclusion other places.

> Stuff to be inserted as one component of e.g. an item page probably
> needs to be built in.  Stuff that would be a page on its own should
> perhaps not be part of DSpace at all, but rather something we make
> easy to do with other tools.

I'm not sure this is the distinction I would make. To me, the question
is whether a given set of statistics needs to know anything specific
about the way DSpace structures the universe. So I might well have
special pages outside DSpace containing DSpace by-author statistics,
but it's impossible (isn't it?) to tweak a Webalizer install into
capturing stats by author. I still need to rely on DSpace to carve up
the accesses correctly.

> We need to keep clearly in mind the distinction between capturing raw
> cases (someone fetched a bitstream) and abstracting useful patterns
> from the collected cases (frequency histogram of this collection's
> fetches over time, last month's fetches broken down by nation of
> origin).

Well, developers do. End-users, perhaps not so much. :)

> What might be helpful is to provide some views or stored procedures
> that stat. tools could use to classify observations.  Such tools
> usually have good facilities for poking around in databases, but could
> perhaps use help in getting the information they need without having to
> understand (and track changes to!) the fulness of DSpace's schema.

Interesting. Where would this leave the average repository manager who
isn't using Stata, but just wants some numbers to show people?

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From tdonohue at illinois.edu  Tue Aug 26 11:07:43 2008
From: tdonohue at illinois.edu (Tim Donohue)
Date: Tue, 26 Aug 2008 10:07:43 -0500
Subject: [Dspace-general] Statistics
In-Reply-To: <A30994AC067CB646AAD4698C727A1A0204210E1B@libex1.lbr.auckland.ac.nz>
References: <mailman.327.1219680198.17492.dspace-general@mit.edu>
	<A30994AC067CB646AAD4698C727A1A0204210E1B@libex1.lbr.auckland.ac.nz>
Message-ID: <48B41C3F.3010709@illinois.edu>

All,

Just a comment on Leonie's praise of the Statistics we are using for 
IDEALS (www.ideals.uiuc.edu):

Leonie Hayes wrote:
> Dear DSpace Community
> 
> Statistics
> 
> 1. From a what works perspective there is already beautiful statistics
> implementations addressing the minimum requirements, I think the IDEALS
> repository has what I would be very happy with, these guys seem to be
> one step ahead http://www.ideals.uiuc.edu I can remember asking Tim
> Donohue about their implementation a few years ago, he said it was a
> very customised solution, please correct me if wrong. I also find the
> eprints and Fez Fedora stats are pretty good.

Thanks for the praise...much appreciated! :)  Though, some of the kudos 
should go to U of Rochester (http://urresearch.rochester.edu/), who 
initially created the Statistics package we use for DSpace.  We've made 
some local modifications (like the "Top 10 Downloads" list), but much of 
the original work was done at U of Rochester.

However, it's worth mentioning to all that although the statistics we 
are using for IDEALS look "pretty", there's still quite a bit of 
"ugliness" underneath.  The main problem we have is that our statistics 
package does *NOT* automatically filter out web-crawlers like 
Google/Yahoo.  Instead, it requires a person to go in and manually 
filter out downloads (via IP address) which look to be web-crawlers. 
It's definitely *not* a solution that scales well.

So, although I think it was already mentioned, I'd add as a requirement 
for a good Statistics Package:

* Must filter out web-crawlers in a semi-automated fashion!

- Tim

-- 
Tim Donohue
Research Programmer, Illinois Digital Environment for
Access to Learning and Scholarship (IDEALS)
University of Illinois at Urbana-Champaign
tdonohue at illinois.edu | (217) 333-4648


From tdonohue at illinois.edu  Tue Aug 26 12:09:15 2008
From: tdonohue at illinois.edu (Tim Donohue)
Date: Tue, 26 Aug 2008 11:09:15 -0500
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>	<20080825145520.GF15124@IUPUI.Edu>
	<356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
Message-ID: <48B42AAB.6010804@illinois.edu>

Dorothea & all,

Dorothea Salo wrote:
> 2008/8/25 Mark H. Wood <mwood at iupui.edu>:
>> One thing to keep in mind about whole-site statistical tables is that
>> there are already tools to do this for web sites in general, such as
>> AWStats or Webalizer or whatever your favorite may be.  We probably
>> should not spend effort to try to duplicate those.
> 
> Perhaps not, but if this is the direction we want people to go in, we
> probably ought to document how to do it, at least informally on the
> wiki. Does anybody have such a system in place?

For IDEALS (www.ideals.uiuc.edu), we use AWStats to get site-wide 
traffic information.  However, that information is *not* publicly 
accessible.  We only use it for administrative purposes, since most of 
the information AWStats generates for us is generally *not* useful to 
our users.

So, for example, AWStats can provide us with the following general 
information:
   * Which features of DSpace are being used most frequently (e.g. 
Subject Browse, Community/Collection browse, search, etc.)
   * Which web browsers our users are using
   * # of overall hits in a given month,week,day,hour
   * Approximate amount of time users spend on our site
   * What external resources people use to get to our site (e.g. Google, 
Blog posts, Library website, etc.)
   * The top searches used to get to your site (in Google, Yahoo, MSN, etc)

But, AWStats only works at a global level.  So, it *cannot* give us any 
real information at a community, collection or item level, since it 
doesn't understand DSpace's internal structure and cannot parse DSpace's 
log files (it parses the *web server* log files, rather than DSpace's 
internal logs)

So, in the end, AWStats is a worthwhile tool to keep in mind.  However, 
without some major customizations specific to DSpace, it's really more 
of an Administrative tool to help you determine *how* users are using 
your site.  It doesn't give any real worthwhile "statistics" in terms of 
file downloads or individual community/collection access counts, which 
are more likely to be useful to your users.

- Tim

-- 
Tim Donohue
Research Programmer, Illinois Digital Environment for
Access to Learning and Scholarship (IDEALS)
University of Illinois at Urbana-Champaign
tdonohue at illinois.edu | (217) 333-4648


From mwood at IUPUI.Edu  Tue Aug 26 15:47:20 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Tue, 26 Aug 2008 15:47:20 -0400
Subject: [Dspace-general] Statistics
In-Reply-To: <48B41C3F.3010709@illinois.edu>
References: <mailman.327.1219680198.17492.dspace-general@mit.edu>
	<A30994AC067CB646AAD4698C727A1A0204210E1B@libex1.lbr.auckland.ac.nz>
	<48B41C3F.3010709@illinois.edu>
Message-ID: <20080826194720.GA20164@IUPUI.Edu>

On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote:
> So, although I think it was already mentioned, I'd add as a requirement 
> for a good Statistics Package:
> 
> * Must filter out web-crawlers in a semi-automated fashion!

+1!  Suggestions as to how?

The Rochester mod.s could be augmented to filter out the easiest cases
more simply.  Some well-behaved crawlers can be spotted automatically.
(No, I don't recall how.)  The filter rules could be made more
flexible than just a single type of fixed-size netblocks (if memory
serves).  I've been meaning to work on these at some point, but
haven't yet reached That Point.

Crawler filtering sounds like something that might be abstracted from
the various existing stat. patches and provided as a common service.
We all should invent this wheel only once.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/7dddd0c1/attachment.bin

From dsalo at library.wisc.edu  Tue Aug 26 16:09:16 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Tue, 26 Aug 2008 15:09:16 -0500
Subject: [Dspace-general] Statistics
In-Reply-To: <20080826194720.GA20164@IUPUI.Edu>
References: <mailman.327.1219680198.17492.dspace-general@mit.edu>
	<A30994AC067CB646AAD4698C727A1A0204210E1B@libex1.lbr.auckland.ac.nz>
	<48B41C3F.3010709@illinois.edu> <20080826194720.GA20164@IUPUI.Edu>
Message-ID: <356cf3980808261309j1a9964adif49b5ecefe5b98fe@mail.gmail.com>

2008/8/26 Mark H. Wood <mwood at iupui.edu>:
> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote:
>> So, although I think it was already mentioned, I'd add as a requirement
>> for a good Statistics Package:
>>
>> * Must filter out web-crawlers in a semi-automated fashion!
>
> +1!  Suggestions as to how?

The site <http://www.user-agents.org/> maintains a list of
user-agents, classified by type. They have an XML-downloadable version
at <http://www.user-agents.org/allagents.xml>, as well as an RSS-feed
updater. Perhaps polling this would be a useful starting point?

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From tdonohue at illinois.edu  Tue Aug 26 16:29:23 2008
From: tdonohue at illinois.edu (Tim Donohue)
Date: Tue, 26 Aug 2008 15:29:23 -0500
Subject: [Dspace-general] Statistics
In-Reply-To: <356cf3980808261309j1a9964adif49b5ecefe5b98fe@mail.gmail.com>
References: <mailman.327.1219680198.17492.dspace-general@mit.edu>	<A30994AC067CB646AAD4698C727A1A0204210E1B@libex1.lbr.auckland.ac.nz>	<48B41C3F.3010709@illinois.edu>
	<20080826194720.GA20164@IUPUI.Edu>
	<356cf3980808261309j1a9964adif49b5ecefe5b98fe@mail.gmail.com>
Message-ID: <48B467A3.7080100@illinois.edu>


Dorothea Salo wrote:
> 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
>> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote:
>>> So, although I think it was already mentioned, I'd add as a requirement
>>> for a good Statistics Package:
>>>
>>> * Must filter out web-crawlers in a semi-automated fashion!
>> +1!  Suggestions as to how?
> 
> The site <http://www.user-agents.org/> maintains a list of
> user-agents, classified by type. They have an XML-downloadable version
> at <http://www.user-agents.org/allagents.xml>, as well as an RSS-feed
> updater. Perhaps polling this would be a useful starting point?
> 
> Dorothea
> 

Universidade of Minho's Statistics Add-On for DSpace can do some basic 
automated filtering of web crawlers:

See its list of main features on the DSpace Wiki:

http://wiki.dspace.org/index.php//StatisticsAddOn

(It looks like they determine spiders by how spiders tend to identify 
themselves.  Most "nice" spiders, like Google, will identify themselves 
in a common fashion, e.g. "Googlebot")

Frankly, although our statistics for IDEALS are nice looking...Minho's 
work is much more extensive and offers a greater variety of features 
(from what I've seen/heard of it).  It's just missing our "Top 10 
Downloads" list :)

- Tim


-- 
Tim Donohue
Research Programmer, Illinois Digital Environment for
Access to Learning and Scholarship (IDEALS)
University of Illinois at Urbana-Champaign
tdonohue at illinois.edu | (217) 333-4648


From mwood at IUPUI.Edu  Tue Aug 26 16:34:33 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Tue, 26 Aug 2008 16:34:33 -0400
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
	<20080825145520.GF15124@IUPUI.Edu>
	<356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
Message-ID: <20080826203433.GB20164@IUPUI.Edu>

On Tue, Aug 26, 2008 at 09:44:45AM -0500, Dorothea Salo wrote:
> 2008/8/25 Mark H. Wood <mwood at iupui.edu>:
> > What might be helpful is to provide some views or stored procedures
> > that stat. tools could use to classify observations.  Such tools
> > usually have good facilities for poking around in databases, but could
> > perhaps use help in getting the information they need without having to
> > understand (and track changes to!) the fulness of DSpace's schema.
> 
> Interesting. Where would this leave the average repository manager who
> isn't using Stata, but just wants some numbers to show people?

Well, it depends on which numbers are wanted.  I do think there will
be some reports that are popular enough, and easy enough to get right,
that they should be built in.  The support for external tools would be
aimed at people who do want to use them.  What sort of data would be
useful to the manager who isn't into heavy statistical analysis, which
aren't likely to be provided as built-ins?

Where I'm going is:

o  The realm of reasonable possibilities for statistical analysis and
   presentation of DSpace activity is rather huge;

o  people who understand statistical processing have already figured
   out the hard parts of analysis and presentation;

o  the tail should not be allowed to wag the dog -- we want
   statistics, but that's subordinate to building excellend document
   repository software.  Part of, important, but in a supporting role.

So I am hoping that we can mostly satisfy most people with relatively
modest built-in statistical support, and take care of the other cases
with modest support for the development of external reporting
mechanisms.  This being a community, I imagine that some will develop
external solutions that they can share.

This is one reason why I think that it should be as easy as possible
for multiple stat. projects to tap into built-in streams of
observations.  Different sites have different needs, and I think we
need to be able to easily play with various ways of doing stat.s.  I'm
not convinced that we are going to understand the need sufficiently
without getting into the field a selection of solutions that can be
easily snapped in and tried by a sizable number of sites.  There are a
number of good attempts now, but it's not easy to install them and
that limits the amount of experience we can gather.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/d1873706/attachment.bin

From dsalo at library.wisc.edu  Tue Aug 26 19:13:14 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Tue, 26 Aug 2008 18:13:14 -0500
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <20080826203433.GB20164@IUPUI.Edu>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
	<20080825145520.GF15124@IUPUI.Edu>
	<356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
	<20080826203433.GB20164@IUPUI.Edu>
Message-ID: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>

2008/8/26 Mark H. Wood <mwood at iupui.edu>:

> Well, it depends on which numbers are wanted.  I do think there will
> be some reports that are popular enough, and easy enough to get right,
> that they should be built in.  The support for external tools would be
> aimed at people who do want to use them.  What sort of data would be
> useful to the manager who isn't into heavy statistical analysis, which
> aren't likely to be provided as built-ins?

Well, I hope that's where the discussion this week has been pointing.
If not, we'll have to find a different way to gather that information.
Looking at existing implementations of statistics (e.g. EPrints, SSRN)
might be a start.

> o  the tail should not be allowed to wag the dog -- we want
>   statistics, but that's subordinate to building excellent document
>   repository software.  Part of, important, but in a supporting role.

This is such an interesting statement that I think I will make it next
week's topic! What *is* excellent document repository software? I have
a feeling that the non-developer community may have a rather different
take on it from most developers... we'll see if I'm right.

> So I am hoping that we can mostly satisfy most people with relatively
> modest built-in statistical support, and take care of the other cases
> with modest support for the development of external reporting
> mechanisms.

I'd be interested to know how the proposals that have been put forward
this week place on a modesty scale. Developers?

> This is one reason why I think that it should be as easy as possible
> for multiple stat. projects to tap into built-in streams of
> observations.  Different sites have different needs, and I think we
> need to be able to easily play with various ways of doing stat.s.

Agreed, but just to toss this out: I foresee a countervailing pressure
in future toward standardized and aggregated statistics across
repositories. I have heard a number of statements to the effect that
faculty are using download counts from disciplinary repositories in
tenure-and-promotion packages. As their work becomes scattered and/or
duplicated across various repositories, they're going to want to
aggregate that information.

>  There are a
> number of good attempts now, but it's not easy to install them and
> that limits the amount of experience we can gather.

+1. This is a problem for more than just statistics!

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From christophe.dupriez at destin.be  Wed Aug 27 04:37:12 2008
From: christophe.dupriez at destin.be (Christophe Dupriez)
Date: Wed, 27 Aug 2008 10:37:12 +0200
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
Message-ID: <48B51238.4010008@destin.be>

Hi Dorothea and participants to this discussion!

I would like to say that statistics are there for different purposes:
1) detect errors (why nobody looked at my site last sunday?)
2) provide KPI (Key Performance Indicators), measures that a manager 
follows on the medium term to take organisational decisions
3) investigate new hypothesis before investing to change the organisation.

For purpose (3), by essence, you need to "open" to analysis the detailed 
logs of the events and the data stored in DSpace. Generic programs like 
SAS or reports generators are the best to dig in data and answer to new, 
unforeseen questions. Everybody in the community will be happy to have 
this "back door" available.

For purpose (2), we need to know what KPIs are needed by IR managers. I 
will go further, new IRs and their managers would be very happy not to 
reinvent KPIs and to have good ones already proposed to sustain a 
documented IR development process. A very big part of DSpace 
attractiveness is (and should be implemented really!) that it provides 
"best practices" for IR management (and not only computing).

For purpose (2), Use cases, practices, measures must be designed 
upfront. It will contribute strongly to the overall specifications of 
DSpace.

For purpose (1), a more formal, bottom up, data driven approach may be 
sufficient to install validation tools (like the checksum checker) to 
ensure that DSpace operations are "in line".

So we have no choice: we have to listen IR managers (please come by!) to 
know the good practices DSpace must support...

Have a nice day!

Christophe
(peeking on the list when I should not during my holidays!)


Dorothea Salo a ?crit :
> Greetings, DSpace community,
>
> I want to thank everyone once again for last week's stimulating
> discussion and impressive chat turnout! I have a new question for
> everyone this week, pursuant to some discussion on the lists:
>
> "Statistics" are one of the commonest requests for a new DSpace
> feature. Without further specification, however, it's hard to know
> what data to present, since there are no standards or even clear best
> practices in this area. What statistics do the following groups of
> DSpace users need to see, and in what form are the statistics best
> presented to them?
>
> Depositors
> End-users (defined as "people examining items and downloading
> bitstreams from a DSpace instance;" we may have to refine this further
> in discussion)
> DSpace repository managers (as distinct from systems administrators)
>
> What else should developers keep in mind as they implement this feature?
>
> Because it would be nice to reach a working consensus on this (unlike
> last week's question, which was intended to pull out as broad a
> selection of needs as possible), I think we should start discussing
> immediately. I encourage all respondents to respond TO THE MAILING
> LIST instead of to me.
>
> I will be holding another chat to discuss the weekly question. It will
> take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on
> irc.freenode.net. I apologize to West Coast (USA) community members
> for last week's unconscionably early hour; we'll try 10 am US Central
> (11 am Eastern, 4 pm GMT) this week, and we may go even later next
> week if our European community members can stand it.
>
> For those who don't normally use IRC, there are two easy web gateways.
> One is mibbit.com; the other is specific to our channel and can be
> found at <http://dspace.testathon.net/cgi-bin/irc.cgi>. I encourage
> all of us to become familiar with the channel; it is a source of
> real-time technical information from DSpace developers, as well as a
> community in its own right.
>
> Dorothea
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: christophe_dupriez.vcf
Type: text/x-vcard
Size: 454 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080827/3784f643/attachment.vcf

From eloy at sdum.uminho.pt  Wed Aug 27 05:12:04 2008
From: eloy at sdum.uminho.pt (Eloy Rodrigues)
Date: Wed, 27 Aug 2008 10:12:04 +0100
Subject: [Dspace-general] Dspace-general Digest, Vol 61, Issue 22
In-Reply-To: <mailman.87902.1219825858.4453.dspace-general@mit.edu>
References: <mailman.87902.1219825858.4453.dspace-general@mit.edu>
Message-ID: <00a701c90824$f8d80000$ea880000$@uminho.pt>

Dear All,

A detailed description of the functionality and architecture of the
statistics Add-on we have developed can be found on the docs folder of the
downloadable file -
http://wiki.dspace.org/static_files/6/68/Stats-addon-2.0.tar.gz

On our production implementation of the Add-on on RepositoriUM, we have
developed some more tools/functionality for automated and semi-automated
detection and exclusion of crawlers (not only based in "well behaved"
robots, but also on the patterns and behavior from IP addresses, etc.), that
are not available in the version 2.0 of the Add-on. 

As we are currently upgrading Reposit?riUM  to DSpace 1.5, hopefully we will
release a Stats Add-on 2.1, compatible with DSpace 1.5, and including the
new functionality/tools in late September or October.

Best Regards,

Eloy Rodrigues
Universidade do Minho - Servi?os de Documenta??o
Campus de Gualtar - 4710 - 057 Braga 
Telefone: + 351 253604150; Fax: + 351 253604159
Campus de Azur?m - 4800 - 058 Guimar?es
Telefone: + 351 253510168; Fax: + 351 253510117

 
-----Original Message-----
From: dspace-general-bounces at mit.edu [mailto:dspace-general-bounces at mit.edu]
On Behalf Of dspace-general-request at mit.edu
Sent: quarta-feira, 27 de Agosto de 2008 09:31
To: dspace-general at mit.edu
Subject: Dspace-general Digest, Vol 61, Issue 22

Send Dspace-general mailing list submissions to
	dspace-general at mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.mit.edu/mailman/listinfo/dspace-general
or, via email, send a message with subject or body 'help' to
	dspace-general-request at mit.edu

You can reach the person managing the list at
	dspace-general-owner at mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Dspace-general digest..."


Today's Topics:

   1. Re: Week 2: Statistics (Tim Donohue)
   2. Re: Statistics (Mark H. Wood)
   3. Re: Statistics (Dorothea Salo)
   4. Re: Statistics (Tim Donohue)
   5. Re: Week 2: Statistics (Mark H. Wood)
   6. Re: Week 2: Statistics (Dorothea Salo)
   7. Re: Week 2: Statistics (Christophe Dupriez)


----------------------------------------------------------------------

Message: 1
Date: Tue, 26 Aug 2008 11:09:15 -0500
From: Tim Donohue <tdonohue at illinois.edu>
Subject: Re: [Dspace-general] Week 2: Statistics
To: Dorothea Salo <dsalo at library.wisc.edu>
Cc: dspace-general at mit.edu
Message-ID: <48B42AAB.6010804 at illinois.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dorothea & all,

Dorothea Salo wrote:
> 2008/8/25 Mark H. Wood <mwood at iupui.edu>:
>> One thing to keep in mind about whole-site statistical tables is that
>> there are already tools to do this for web sites in general, such as
>> AWStats or Webalizer or whatever your favorite may be.  We probably
>> should not spend effort to try to duplicate those.
> 
> Perhaps not, but if this is the direction we want people to go in, we
> probably ought to document how to do it, at least informally on the
> wiki. Does anybody have such a system in place?

For IDEALS (www.ideals.uiuc.edu), we use AWStats to get site-wide 
traffic information.  However, that information is *not* publicly 
accessible.  We only use it for administrative purposes, since most of 
the information AWStats generates for us is generally *not* useful to 
our users.

So, for example, AWStats can provide us with the following general 
information:
   * Which features of DSpace are being used most frequently (e.g. 
Subject Browse, Community/Collection browse, search, etc.)
   * Which web browsers our users are using
   * # of overall hits in a given month,week,day,hour
   * Approximate amount of time users spend on our site
   * What external resources people use to get to our site (e.g. Google, 
Blog posts, Library website, etc.)
   * The top searches used to get to your site (in Google, Yahoo, MSN, etc)

But, AWStats only works at a global level.  So, it *cannot* give us any 
real information at a community, collection or item level, since it 
doesn't understand DSpace's internal structure and cannot parse DSpace's 
log files (it parses the *web server* log files, rather than DSpace's 
internal logs)

So, in the end, AWStats is a worthwhile tool to keep in mind.  However, 
without some major customizations specific to DSpace, it's really more 
of an Administrative tool to help you determine *how* users are using 
your site.  It doesn't give any real worthwhile "statistics" in terms of 
file downloads or individual community/collection access counts, which 
are more likely to be useful to your users.

- Tim

-- 
Tim Donohue
Research Programmer, Illinois Digital Environment for
Access to Learning and Scholarship (IDEALS)
University of Illinois at Urbana-Champaign
tdonohue at illinois.edu | (217) 333-4648


------------------------------

Message: 2
Date: Tue, 26 Aug 2008 15:47:20 -0400
From: "Mark H. Wood" <mwood at IUPUI.Edu>
Subject: Re: [Dspace-general] Statistics
To: dspace-general at mit.edu
Message-ID: <20080826194720.GA20164 at IUPUI.Edu>
Content-Type: text/plain; charset="us-ascii"

On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote:
> So, although I think it was already mentioned, I'd add as a requirement 
> for a good Statistics Package:
> 
> * Must filter out web-crawlers in a semi-automated fashion!

+1!  Suggestions as to how?

The Rochester mod.s could be augmented to filter out the easiest cases
more simply.  Some well-behaved crawlers can be spotted automatically.
(No, I don't recall how.)  The filter rules could be made more
flexible than just a single type of fixed-size netblocks (if memory
serves).  I've been meaning to work on these at some point, but
haven't yet reached That Point.

Crawler filtering sounds like something that might be abstracted from
the various existing stat. patches and provided as a common service.
We all should invent this wheel only once.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url :
http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/7dddd0c
1/attachment-0001.bin

------------------------------

Message: 3
Date: Tue, 26 Aug 2008 15:09:16 -0500
From: "Dorothea Salo" <dsalo at library.wisc.edu>
Subject: Re: [Dspace-general] Statistics
To: dspace-general at mit.edu
Message-ID:
	<356cf3980808261309j1a9964adif49b5ecefe5b98fe at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

2008/8/26 Mark H. Wood <mwood at iupui.edu>:
> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote:
>> So, although I think it was already mentioned, I'd add as a requirement
>> for a good Statistics Package:
>>
>> * Must filter out web-crawlers in a semi-automated fashion!
>
> +1!  Suggestions as to how?

The site <http://www.user-agents.org/> maintains a list of
user-agents, classified by type. They have an XML-downloadable version
at <http://www.user-agents.org/allagents.xml>, as well as an RSS-feed
updater. Perhaps polling this would be a useful starting point?

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


------------------------------

Message: 4
Date: Tue, 26 Aug 2008 15:29:23 -0500
From: Tim Donohue <tdonohue at illinois.edu>
Subject: Re: [Dspace-general] Statistics
To: Dorothea Salo <dsalo at library.wisc.edu>
Cc: dspace-general at mit.edu
Message-ID: <48B467A3.7080100 at illinois.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed


Dorothea Salo wrote:
> 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
>> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote:
>>> So, although I think it was already mentioned, I'd add as a requirement
>>> for a good Statistics Package:
>>>
>>> * Must filter out web-crawlers in a semi-automated fashion!
>> +1!  Suggestions as to how?
> 
> The site <http://www.user-agents.org/> maintains a list of
> user-agents, classified by type. They have an XML-downloadable version
> at <http://www.user-agents.org/allagents.xml>, as well as an RSS-feed
> updater. Perhaps polling this would be a useful starting point?
> 
> Dorothea
> 

Universidade of Minho's Statistics Add-On for DSpace can do some basic 
automated filtering of web crawlers:

See its list of main features on the DSpace Wiki:

http://wiki.dspace.org/index.php//StatisticsAddOn

(It looks like they determine spiders by how spiders tend to identify 
themselves.  Most "nice" spiders, like Google, will identify themselves 
in a common fashion, e.g. "Googlebot")

Frankly, although our statistics for IDEALS are nice looking...Minho's 
work is much more extensive and offers a greater variety of features 
(from what I've seen/heard of it).  It's just missing our "Top 10 
Downloads" list :)

- Tim


-- 
Tim Donohue
Research Programmer, Illinois Digital Environment for
Access to Learning and Scholarship (IDEALS)
University of Illinois at Urbana-Champaign
tdonohue at illinois.edu | (217) 333-4648


------------------------------

Message: 5
Date: Tue, 26 Aug 2008 16:34:33 -0400
From: "Mark H. Wood" <mwood at IUPUI.Edu>
Subject: Re: [Dspace-general] Week 2: Statistics
To: dspace-general at mit.edu
Message-ID: <20080826203433.GB20164 at IUPUI.Edu>
Content-Type: text/plain; charset="us-ascii"

On Tue, Aug 26, 2008 at 09:44:45AM -0500, Dorothea Salo wrote:
> 2008/8/25 Mark H. Wood <mwood at iupui.edu>:
> > What might be helpful is to provide some views or stored procedures
> > that stat. tools could use to classify observations.  Such tools
> > usually have good facilities for poking around in databases, but could
> > perhaps use help in getting the information they need without having to
> > understand (and track changes to!) the fulness of DSpace's schema.
> 
> Interesting. Where would this leave the average repository manager who
> isn't using Stata, but just wants some numbers to show people?

Well, it depends on which numbers are wanted.  I do think there will
be some reports that are popular enough, and easy enough to get right,
that they should be built in.  The support for external tools would be
aimed at people who do want to use them.  What sort of data would be
useful to the manager who isn't into heavy statistical analysis, which
aren't likely to be provided as built-ins?

Where I'm going is:

o  The realm of reasonable possibilities for statistical analysis and
   presentation of DSpace activity is rather huge;

o  people who understand statistical processing have already figured
   out the hard parts of analysis and presentation;

o  the tail should not be allowed to wag the dog -- we want
   statistics, but that's subordinate to building excellend document
   repository software.  Part of, important, but in a supporting role.

So I am hoping that we can mostly satisfy most people with relatively
modest built-in statistical support, and take care of the other cases
with modest support for the development of external reporting
mechanisms.  This being a community, I imagine that some will develop
external solutions that they can share.

This is one reason why I think that it should be as easy as possible
for multiple stat. projects to tap into built-in streams of
observations.  Different sites have different needs, and I think we
need to be able to easily play with various ways of doing stat.s.  I'm
not convinced that we are going to understand the need sufficiently
without getting into the field a selection of solutions that can be
easily snapped in and tried by a sizable number of sites.  There are a
number of good attempts now, but it's not easy to install them and
that limits the amount of experience we can gather.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url :
http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/d187370
6/attachment-0001.bin

------------------------------

Message: 6
Date: Tue, 26 Aug 2008 18:13:14 -0500
From: "Dorothea Salo" <dsalo at library.wisc.edu>
Subject: Re: [Dspace-general] Week 2: Statistics
To: dspace-general at mit.edu
Message-ID:
	<356cf3980808261613n27ea9a5x917b98b833df37dc at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

2008/8/26 Mark H. Wood <mwood at iupui.edu>:

> Well, it depends on which numbers are wanted.  I do think there will
> be some reports that are popular enough, and easy enough to get right,
> that they should be built in.  The support for external tools would be
> aimed at people who do want to use them.  What sort of data would be
> useful to the manager who isn't into heavy statistical analysis, which
> aren't likely to be provided as built-ins?

Well, I hope that's where the discussion this week has been pointing.
If not, we'll have to find a different way to gather that information.
Looking at existing implementations of statistics (e.g. EPrints, SSRN)
might be a start.

> o  the tail should not be allowed to wag the dog -- we want
>   statistics, but that's subordinate to building excellent document
>   repository software.  Part of, important, but in a supporting role.

This is such an interesting statement that I think I will make it next
week's topic! What *is* excellent document repository software? I have
a feeling that the non-developer community may have a rather different
take on it from most developers... we'll see if I'm right.

> So I am hoping that we can mostly satisfy most people with relatively
> modest built-in statistical support, and take care of the other cases
> with modest support for the development of external reporting
> mechanisms.

I'd be interested to know how the proposals that have been put forward
this week place on a modesty scale. Developers?

> This is one reason why I think that it should be as easy as possible
> for multiple stat. projects to tap into built-in streams of
> observations.  Different sites have different needs, and I think we
> need to be able to easily play with various ways of doing stat.s.

Agreed, but just to toss this out: I foresee a countervailing pressure
in future toward standardized and aggregated statistics across
repositories. I have heard a number of statements to the effect that
faculty are using download counts from disciplinary repositories in
tenure-and-promotion packages. As their work becomes scattered and/or
duplicated across various repositories, they're going to want to
aggregate that information.

>  There are a
> number of good attempts now, but it's not easy to install them and
> that limits the amount of experience we can gather.

+1. This is a problem for more than just statistics!

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


------------------------------

Message: 7
Date: Wed, 27 Aug 2008 10:37:12 +0200
From: Christophe Dupriez <christophe.dupriez at destin.be>
Subject: Re: [Dspace-general] Week 2: Statistics
To: Dorothea Salo <dsalo at library.wisc.edu>
Cc: dspace <dspace-general at mit.edu>
Message-ID: <48B51238.4010008 at destin.be>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dorothea and participants to this discussion!

I would like to say that statistics are there for different purposes:
1) detect errors (why nobody looked at my site last sunday?)
2) provide KPI (Key Performance Indicators), measures that a manager 
follows on the medium term to take organisational decisions
3) investigate new hypothesis before investing to change the organisation.

For purpose (3), by essence, you need to "open" to analysis the detailed 
logs of the events and the data stored in DSpace. Generic programs like 
SAS or reports generators are the best to dig in data and answer to new, 
unforeseen questions. Everybody in the community will be happy to have 
this "back door" available.

For purpose (2), we need to know what KPIs are needed by IR managers. I 
will go further, new IRs and their managers would be very happy not to 
reinvent KPIs and to have good ones already proposed to sustain a 
documented IR development process. A very big part of DSpace 
attractiveness is (and should be implemented really!) that it provides 
"best practices" for IR management (and not only computing).

For purpose (2), Use cases, practices, measures must be designed 
upfront. It will contribute strongly to the overall specifications of 
DSpace.

For purpose (1), a more formal, bottom up, data driven approach may be 
sufficient to install validation tools (like the checksum checker) to 
ensure that DSpace operations are "in line".

So we have no choice: we have to listen IR managers (please come by!) to 
know the good practices DSpace must support...

Have a nice day!

Christophe
(peeking on the list when I should not during my holidays!)


Dorothea Salo a ?crit :
> Greetings, DSpace community,
>
> I want to thank everyone once again for last week's stimulating
> discussion and impressive chat turnout! I have a new question for
> everyone this week, pursuant to some discussion on the lists:
>
> "Statistics" are one of the commonest requests for a new DSpace
> feature. Without further specification, however, it's hard to know
> what data to present, since there are no standards or even clear best
> practices in this area. What statistics do the following groups of
> DSpace users need to see, and in what form are the statistics best
> presented to them?
>
> Depositors
> End-users (defined as "people examining items and downloading
> bitstreams from a DSpace instance;" we may have to refine this further
> in discussion)
> DSpace repository managers (as distinct from systems administrators)
>
> What else should developers keep in mind as they implement this feature?
>
> Because it would be nice to reach a working consensus on this (unlike
> last week's question, which was intended to pull out as broad a
> selection of needs as possible), I think we should start discussing
> immediately. I encourage all respondents to respond TO THE MAILING
> LIST instead of to me.
>
> I will be holding another chat to discuss the weekly question. It will
> take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on
> irc.freenode.net. I apologize to West Coast (USA) community members
> for last week's unconscionably early hour; we'll try 10 am US Central
> (11 am Eastern, 4 pm GMT) this week, and we may go even later next
> week if our European community members can stand it.
>
> For those who don't normally use IRC, there are two easy web gateways.
> One is mibbit.com; the other is specific to our channel and can be
> found at <http://dspace.testathon.net/cgi-bin/irc.cgi>. I encourage
> all of us to become familiar with the channel; it is a source of
> real-time technical information from DSpace developers, as well as a
> community in its own right.
>
> Dorothea
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: christophe_dupriez.vcf
Type: text/x-vcard
Size: 454 bytes
Desc: not available
Url :
http://mailman.mit.edu/pipermail/dspace-general/attachments/20080827/3784f64
3/christophe_dupriez.vcf

------------------------------

_______________________________________________
Dspace-general mailing list
Dspace-general at mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general


End of Dspace-general Digest, Vol 61, Issue 22
**********************************************


From paul.needham11 at btinternet.com  Wed Aug 27 07:38:04 2008
From: paul.needham11 at btinternet.com (Paul Needham)
Date: Wed, 27 Aug 2008 12:38:04 +0100
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
Message-ID: <010501c90839$5e9bd9c0$4101a8c0@EISDESKTOP>

Hi Dorothea

>From my perspective, this week's topic is timely as I've just started work
on the JISC-funded PIRUS (Publisher and Institutional Repository Usage
Statistics) Project, which runs until the end of this year.

The aim of the project is to develop COUNTER-compliant usage reports at the
individual article level that can be implemented by any entity  (publisher,
aggregator, IR, etc.,) that hosts online journal articles and will enable
the usage of research outputs to be recorded, reported and consolidated at a
global level in a standard way.

We have identified the relevant statistics stakeholders as:

* STM publishing community
* IR managers
* Individual researchers
* Research library directors
* HE/FE research funding agencies
* Board of COUNTER

We are only in the early stages of our research at the moment, but, by the
end of the year, hope to be in a position to propose a format for
COUNTER-compliant usage reports, together with supporting protocols, and
submit this to COUNTER for approval as a new standard, to be adopted and
maintained by COUNTER.

Of course, this represents only one part of the wider IR statistics
landscape but may be something useful to throw into the mix!

Wearing another hat, as someone helping to run Cranfield University's IR
(Cranfield CERES), I would echo other comments that have been made on the
need for stats on a per-author and per-school/department basis, as well as
various 'Top Ten' lists.

Regards
Paul

____________________________
Paul A S Needham
Research & Innovation Specialist
Kings Norton Library
Cranfield University
Cranfield
MK43 0AL
  

-----Original Message-----
From: dspace-general-bounces at mit.edu [mailto:dspace-general-bounces at mit.edu]
On Behalf Of Dorothea Salo
Sent: 25 August 2008 14:09
To: dspace; DSpace Tech-List
Subject: [Dspace-general] Week 2: Statistics

Greetings, DSpace community,

I want to thank everyone once again for last week's stimulating
discussion and impressive chat turnout! I have a new question for
everyone this week, pursuant to some discussion on the lists:

"Statistics" are one of the commonest requests for a new DSpace
feature. Without further specification, however, it's hard to know
what data to present, since there are no standards or even clear best
practices in this area. What statistics do the following groups of
DSpace users need to see, and in what form are the statistics best
presented to them?

Depositors
End-users (defined as "people examining items and downloading
bitstreams from a DSpace instance;" we may have to refine this further
in discussion)
DSpace repository managers (as distinct from systems administrators)

What else should developers keep in mind as they implement this feature?

Because it would be nice to reach a working consensus on this (unlike
last week's question, which was intended to pull out as broad a
selection of needs as possible), I think we should start discussing
immediately. I encourage all respondents to respond TO THE MAILING
LIST instead of to me.

I will be holding another chat to discuss the weekly question. It will
take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on
irc.freenode.net. I apologize to West Coast (USA) community members
for last week's unconscionably early hour; we'll try 10 am US Central
(11 am Eastern, 4 pm GMT) this week, and we may go even later next
week if our European community members can stand it.

For those who don't normally use IRC, there are two easy web gateways.
One is mibbit.com; the other is specific to our channel and can be
found at <http://dspace.testathon.net/cgi-bin/irc.cgi>. I encourage
all of us to become familiar with the channel; it is a source of
real-time technical information from DSpace developers, as well as a
community in its own right.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493
_______________________________________________
Dspace-general mailing list
Dspace-general at mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general


From dsalo at library.wisc.edu  Wed Aug 27 09:28:47 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Wed, 27 Aug 2008 08:28:47 -0500
Subject: [Dspace-general] Chat in 90 minutes
Message-ID: <356cf3980808270628m11c280a6ybe04ad61e1d6915a@mail.gmail.com>

Good day everyone,

We'll be holding our second DSpace development chat in the #dspace IRC
channel on irc.freenode.net approximately 90 minutes from now (10 am
Central, 11 Eastern, 4 pm GMT). I will be turning up about half an
hour beforehand, after a morning meeting.

The topic of the day is statistics! The goal is to reach rough
consensus on a baseline set of end-user-facing statistics we believe
DSpace should offer.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From mwood at IUPUI.Edu  Wed Aug 27 09:46:54 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Wed, 27 Aug 2008 09:46:54 -0400
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
	<20080825145520.GF15124@IUPUI.Edu>
	<356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
	<20080826203433.GB20164@IUPUI.Edu>
	<356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
Message-ID: <20080827134654.GA24195@IUPUI.Edu>

On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote:
> 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
[snip]
> This is such an interesting statement that I think I will make it next
> week's topic! What *is* excellent document repository software? I have
> a feeling that the non-developer community may have a rather different
> take on it from most developers... we'll see if I'm right.

I think you are, and I look forward to that discussion!
 
> > This is one reason why I think that it should be as easy as possible
> > for multiple stat. projects to tap into built-in streams of
> > observations.  Different sites have different needs, and I think we
> > need to be able to easily play with various ways of doing stat.s.
> 
> Agreed, but just to toss this out: I foresee a countervailing pressure
> in future toward standardized and aggregated statistics across
> repositories. I have heard a number of statements to the effect that
> faculty are using download counts from disciplinary repositories in
> tenure-and-promotion packages. As their work becomes scattered and/or
> duplicated across various repositories, they're going to want to
> aggregate that information.

Quite so.  I just don't feel that we've yet got to the point at which
we understand how to do that well.  A lot of good solutions come about
in this way: an abstract and somewhat indistinct common need is
recognized; a number of people all go off in different directions and
try things; solutions are compared, borrow from each other, coalesce;
finally a now well-understood need finds itself fulfilled with one or
two mature implementations.  I feel that we're still deep in the "try
things" phase.

The degree to which statistics are desired and used suggests that, in
addition to traditional reports, we should be thinking in terms of
exposing statistical products in machine-readable form.  I have been
thinking for some time that we might, with reasonable effort, help to
work out a lingua franca for exchanging usage statistics among
repositories of various "brands" so that the utility of various ideas,
and the behavior of repository users, might be studied more
effectively.  But again, what we can all agree on will very likely be
a small subset of what we can individually envision.

This really ought to be considered early-on, because if we can come up
with a common theme in the abstract, then machine- and human-readable
reporting become side-by-side layers on top of the pool of statistical
data products, and both will be easier to think about if they are
merely formatting something already produced.  Likewise the production
of those stat.s will be easier to think about if presentation issues
can be separated from the task.

I do *not* mean to say here that the statistics that people want now
should have to wait indefinitely on some Grand Scheme to do it all.
It would be better to organize the development in successive
approximations if it looks like taking too long to do it all in one
push.  It's probably going to take several years to fully realize
satisfactory monitoring and reporting of DSpace usage, but that
doesn't mean that we can't provide better and better approximations
much sooner.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080827/97f4755c/attachment.bin

From sdl at aber.ac.uk  Wed Aug 27 10:39:48 2008
From: sdl at aber.ac.uk (Stuart Lewis)
Date: Wed, 27 Aug 2008 15:39:48 +0100
Subject: [Dspace-general] The RSP launch 'The DSpace Course' - a suite of
 DSpace training modules
In-Reply-To: <C4DB2561.1753C%sdl@aber.ac.uk>
Message-ID: <C4DB25C4.17540%sdl@aber.ac.uk>

[Apologies for cross-posting...]


Today the JISC-funded Repositories Support Project (http://rsp.ac.uk/) have
formally launched a modular training course for DSpace - "The DSpace
Course". The course materials have been published with a Creative Commons
licence in order to facilitate their re-use.

The course is suitable for DSpace administrators and developers, with the
choice of modules being dependent on the people taking the course. The
course tutor can mix-and-match the modules to create a custom course. Each
module comes with a set of PowerPoint slides, and an associated student
workbook. The course has been successfully taught in the UK and Italy.

There are 20 modules in the course, with more modules due to be added soon.
The modules include:

 - An Introduction to DSpace
 - How to Get Help
 - Repository Structure
 - Identifiers
 - DSpace Configuration
 - User management and authentication options
 - Metadata Input Customisation
 - Look and Feel Customisation
 - Language Customisation
 - Item Submission Workflows
 - Import and Export
 - Configuring LDAP
 - Upgrading from 1.4. to 1.5

In addition to the course materials the RSP has released a DSpace 'Live CD'.

The CD allows any PC to be used as training machine with a copy of DSpace
pre-installed, along with all of the files required to perform a new
installation. 

The CD is inserted into a computer upon boot, and will load a live version
of the DSpace software without installation to the hard drive. Upon
completion of the training course, remove the CD and the normal operating
system will be loaded upon restart of the PC.

The course materials can be downloaded from:

 - http://hdl.handle.net/2160/615

The Live CD can be downloaded from:

 - http://hdl.handle.net/2160/563

The course has been written by Stuart Lewis (DSpace committer, developer and
trainer), Chris Yates (DSpace developer, support provider and trainer) and
has benefited from input by Claudia J?rgen (DSpace committer, developer and
trainer).

For help and support, please direct all enquiries related to the course to
support at rsp.ac.uk. 

In addition, the support team may be able to put you in touch with suitable
trainers who could teach the course in your area.


From randy_stern at harvard.edu  Wed Aug 27 13:57:58 2008
From: randy_stern at harvard.edu (Randy Stern)
Date: Wed, 27 Aug 2008 13:57:58 -0400
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <20080827134654.GA24195@IUPUI.Edu>
References: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
	<356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
	<20080825145520.GF15124@IUPUI.Edu>
	<356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
	<20080826203433.GB20164@IUPUI.Edu>
	<356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
Message-ID: <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu>

One useful distinction is to separate to some degree the statistics that we 
may want to calculate from the events/raw data that needs to be recorded by 
the DSpace system as it operates. As long as the events are recorded in the 
database (preferably *not* logged in files), various computations, 
aggregations, reports, and APIs for exposing that data can be generated 
later. So we may want to focus initially on what data to record and plan 
for a statistics data model, database tables, and recording to be built 
into DSpace 2.0.

At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote:
>On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote:
> > 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
>[snip]
> > This is such an interesting statement that I think I will make it next
> > week's topic! What *is* excellent document repository software? I have
> > a feeling that the non-developer community may have a rather different
> > take on it from most developers... we'll see if I'm right.
>
>I think you are, and I look forward to that discussion!
>
> > > This is one reason why I think that it should be as easy as possible
> > > for multiple stat. projects to tap into built-in streams of
> > > observations.  Different sites have different needs, and I think we
> > > need to be able to easily play with various ways of doing stat.s.
> >
> > Agreed, but just to toss this out: I foresee a countervailing pressure
> > in future toward standardized and aggregated statistics across
> > repositories. I have heard a number of statements to the effect that
> > faculty are using download counts from disciplinary repositories in
> > tenure-and-promotion packages. As their work becomes scattered and/or
> > duplicated across various repositories, they're going to want to
> > aggregate that information.
>
>Quite so.  I just don't feel that we've yet got to the point at which
>we understand how to do that well.  A lot of good solutions come about
>in this way: an abstract and somewhat indistinct common need is
>recognized; a number of people all go off in different directions and
>try things; solutions are compared, borrow from each other, coalesce;
>finally a now well-understood need finds itself fulfilled with one or
>two mature implementations.  I feel that we're still deep in the "try
>things" phase.
>
>The degree to which statistics are desired and used suggests that, in
>addition to traditional reports, we should be thinking in terms of
>exposing statistical products in machine-readable form.  I have been
>thinking for some time that we might, with reasonable effort, help to
>work out a lingua franca for exchanging usage statistics among
>repositories of various "brands" so that the utility of various ideas,
>and the behavior of repository users, might be studied more
>effectively.  But again, what we can all agree on will very likely be
>a small subset of what we can individually envision.
>
>This really ought to be considered early-on, because if we can come up
>with a common theme in the abstract, then machine- and human-readable
>reporting become side-by-side layers on top of the pool of statistical
>data products, and both will be easier to think about if they are
>merely formatting something already produced.  Likewise the production
>of those stat.s will be easier to think about if presentation issues
>can be separated from the task.
>
>I do *not* mean to say here that the statistics that people want now
>should have to wait indefinitely on some Grand Scheme to do it all.
>It would be better to organize the development in successive
>approximations if it looks like taking too long to do it all in one
>push.  It's probably going to take several years to fully realize
>satisfactory monitoring and reporting of DSpace usage, but that
>doesn't mean that we can't provide better and better approximations
>much sooner.
>
>--
>Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
>Typically when a software vendor says that a product is "intuitive" he
>means the exact opposite.
>
>
>_______________________________________________
>Dspace-general mailing list
>Dspace-general at mit.edu
>http://mailman.mit.edu/mailman/listinfo/dspace-general


Randy Stern
Manager of Systems Development
Harvard University Library Office for Information Systems
90 Mount Auburn Street
Cambridge, MA 02138
Tel. +1 (617) 495-3724
Email <randy_stern at harvard.edu>


From peter.kennedy at canterbury.ac.nz  Wed Aug 27 16:55:39 2008
From: peter.kennedy at canterbury.ac.nz (Peter Kennedy)
Date: Thu, 28 Aug 2008 08:55:39 +1200
Subject: [Dspace-general] Statistics
In-Reply-To: <48B51238.4010008@destin.be>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<48B51238.4010008@destin.be>
Message-ID: <297620FB8039FE4B9BD97F690F4F306002CB2868@ucexchange4.canterbury.ac.nz>

> I would like to say that statistics are there for different purposes:
> 1) detect errors (why nobody looked at my site last sunday?)
> 2) provide KPI (Key Performance Indicators), measures that a manager
> follows on the medium term to take organisational decisions
> 3) investigate new hypothesis before investing to change the
> organisation.

And we can add to that the use of statistics as a marketing tool - in
particular to show academic staff how much use is being made of their
contributions and, perhaps, also to encourage others to contribute.

Regards, Peter Kennedy


From peter.kennedy at canterbury.ac.nz  Wed Aug 27 16:55:39 2008
From: peter.kennedy at canterbury.ac.nz (Peter Kennedy)
Date: Thu, 28 Aug 2008 08:55:39 +1200
Subject: [Dspace-general] Statistics
In-Reply-To: <48B51238.4010008@destin.be>
References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<48B51238.4010008@destin.be>
Message-ID: <297620FB8039FE4B9BD97F690F4F306002CB2868@ucexchange4.canterbury.ac.nz>

> I would like to say that statistics are there for different purposes:
> 1) detect errors (why nobody looked at my site last sunday?)
> 2) provide KPI (Key Performance Indicators), measures that a manager
> follows on the medium term to take organisational decisions
> 3) investigate new hypothesis before investing to change the
> organisation.

And we can add to that the use of statistics as a marketing tool - in
particular to show academic staff how much use is being made of their
contributions and, perhaps, also to encourage others to contribute.

Regards, Peter Kennedy


From scott.yeadon at anu.edu.au  Wed Aug 27 18:40:11 2008
From: scott.yeadon at anu.edu.au (Scott Yeadon)
Date: Thu, 28 Aug 2008 08:40:11 +1000
Subject: [Dspace-general] Statistics
In-Reply-To: <mailman.87902.1219825858.4453.dspace-general@mit.edu>
References: <mailman.87902.1219825858.4453.dspace-general@mit.edu>
Message-ID: <0K6A00LNO6YZKG90@messaging1.anu.edu.au>

Hi,

While jumping ahead a bit and not completely relevant to the context of 
this discussion, it's important in any solution to separate out event 
capture and statistics. Web server level statistics will only get you so 
far. Having recently been through an exercise in building a prototype 
statistics aggregator, the fundamentals in producing "good" statistics 
(i.e. the reported information) is the *targetted capture* of events 
(i.e. the raw event data) typically by the application (i.e. in the 
DSpace code). We found the majority of reports which people want (or 
rather the accuracy and granularity thereof) can only be provided where 
the application has captured the event information rather than the more 
general-level web container app. If you couple the DSpace 1.5.x event 
producer/consumer feature with something like the De Minho front-end or 
a Manakin stats aspect, that would make a pretty neat default stats package.

Scott.

dspace-general-request at mit.edu wrote:
> Message: 1
> Date: Tue, 26 Aug 2008 11:09:15 -0500
> From: Tim Donohue <tdonohue at illinois.edu>
> Subject: Re: [Dspace-general] Week 2: Statistics
> To: Dorothea Salo <dsalo at library.wisc.edu>
> Cc: dspace-general at mit.edu
> Message-ID: <48B42AAB.6010804 at illinois.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dorothea & all,
>
> Dorothea Salo wrote:
>   
>> 2008/8/25 Mark H. Wood <mwood at iupui.edu>:
>>     
>>> One thing to keep in mind about whole-site statistical tables is that
>>> there are already tools to do this for web sites in general, such as
>>> AWStats or Webalizer or whatever your favorite may be.  We probably
>>> should not spend effort to try to duplicate those.
>>>       
>> Perhaps not, but if this is the direction we want people to go in, we
>> probably ought to document how to do it, at least informally on the
>> wiki. Does anybody have such a system in place?
>>     
>
> For IDEALS (www.ideals.uiuc.edu), we use AWStats to get site-wide 
> traffic information.  However, that information is *not* publicly 
> accessible.  We only use it for administrative purposes, since most of 
> the information AWStats generates for us is generally *not* useful to 
> our users.
>
> So, for example, AWStats can provide us with the following general 
> information:
>    * Which features of DSpace are being used most frequently (e.g. 
> Subject Browse, Community/Collection browse, search, etc.)
>    * Which web browsers our users are using
>    * # of overall hits in a given month,week,day,hour
>    * Approximate amount of time users spend on our site
>    * What external resources people use to get to our site (e.g. Google, 
> Blog posts, Library website, etc.)
>    * The top searches used to get to your site (in Google, Yahoo, MSN, etc)
>
> But, AWStats only works at a global level.  So, it *cannot* give us any 
> real information at a community, collection or item level, since it 
> doesn't understand DSpace's internal structure and cannot parse DSpace's 
> log files (it parses the *web server* log files, rather than DSpace's 
> internal logs)
>
> So, in the end, AWStats is a worthwhile tool to keep in mind.  However, 
> without some major customizations specific to DSpace, it's really more 
> of an Administrative tool to help you determine *how* users are using 
> your site.  It doesn't give any real worthwhile "statistics" in terms of 
> file downloads or individual community/collection access counts, which 
> are more likely to be useful to your users.
>
> - Tim
>
>   


From mdiggory at MIT.EDU  Wed Aug 27 20:08:01 2008
From: mdiggory at MIT.EDU (Mark Diggory)
Date: Wed, 27 Aug 2008 17:08:01 -0700
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu>
References: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
	<356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
	<20080825145520.GF15124@IUPUI.Edu>
	<356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
	<20080826203433.GB20164@IUPUI.Edu>
	<356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
	<5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu>
Message-ID: <C1C8FEEE-3A07-47BD-8B26-28F9E994301E@mit.edu>

There is a degree of cleanup you would want to do to the incoming  
data prior to storing it.

1.) host name and geoip resolution on the ip
2.) elimination of bots and automated download tools
3.) elimination of duplicate requests (double-clicks)

Likewise, it would be very important to obfuscate/clean the IP data  
that gets stored to eliminate privacy concerns when governments come  
knocking at your door.

I would recommend a different db instance to store such data that can  
have a connection pool configured to optimize writing over reading.

I would recommend evaluating Reporting Engines and Frameworks to  
arrive at an optimal database configuration independent of DSpace.

We've worked hard on an internal Statistics/Reporting solution for  
DSpace at MIT that uses the DSpace DB another storage database and  
processing across Apache Logs.  Eventually I'd like to see us move to  
a usage event driven update process rather than our Apache log  
trolling. It currently doesn't sport a UI and generates spreadsheet  
reports.

I think its important to separate the DSpace database, the statistics  
database, the reporting tools and the User interface needs into  
separate but related projects so that they may evolve and be  
supported addons to DSpace.  Ultimately this means we working in the  
DSpace Core need to make sure hooks like "Usage Events" get into  
place and are available for Addons to attach listeners to.  What does  
this mean to the group:

1.) Endorsing and Shepherding those changes to the core code-base  
into a near future release.
2.) Evaluation of the need for a common notification framework for  
both Usage and Modification events.
3.) Establishing a Roadmap that is inclusive, allowing new projects  
and team members to participate within the development/release process.

---

As well, I find the following concerns with the Minho statistics addon.

A.) Usage of procedural postgreql excludes oracle users and restricts  
portability and introduces a layer of complexity that requires  
maintainers to be able to debug within a layer that is not  
traditionally customized by DSpace. I feel that this needs to be in  
the java implementation rather than in a storage specific language  
and execution environment. This is a major factor in our not using  
the Minho solution at DSpace at MIT.

B.) Overlays may be used to deploy on top of JSPUI/XMLUI, but we  
should work for better plug-ability of this functionality.  
Specifically, we've seen that the JSPUI's usage of JSP Tag libraries  
isn't ideal or well designed in DSpace. The usage of Tag libraries  
should ideally be replaced with Beans/Collections and JSTL iterator  
tags. THe JSPUI should be looking at templates and portlets for  
solutions to allow plugability rather than direct customization of  
JSP's, Taglibraries and Servlets by the community.

Tangent: This is why the XMLUI was created, to get away from this bad  
design.

C.) I commend the usage of a separate SQL namespace, but suggest  
further that it might be better to be a completely separate DB  
allowing optimized write connections independent of the dspace db,  
whose connections are better optimized for reading and transactional  
security.

D.) The Usage of the JDBC Log4j appender, while creative, introduces  
another layer of complexity that isn't explicit. A Plugable  
UsageEvent API may better manage the generation of events in the UI  
to be directed to the Statistics addon. This may be of lesser concern  
because it could be adapted to work as a UsageEvent consumer, rather  
than consuming Logging events destined for dspace.log directly.

These comments are meant to be constructive, speaking for the  
community, I think don't want to see this work fall to the "wayside"  
and work to eliminate barriers to its update into the community.

I highly promote that those working on projects within the community  
(such as the Minho statistics addon) take advantage of the tools and  
services we are maintaining to enable your work in an open  
environment where you can seek support and advice directly from the  
community of DSpace developers.

We are working on a new Contribution WIKI page section to outline  
these Services and the policies and procedures around working with them.

http://wiki.dspace.org/index.php/ 
DSpaceResources#DSpace_Community_Sandbox

-Mark

On Aug 27, 2008, at 10:57 AM, Randy Stern wrote:

> One useful distinction is to separate to some degree the statistics  
> that we
> may want to calculate from the events/raw data that needs to be  
> recorded by
> the DSpace system as it operates. As long as the events are  
> recorded in the
> database (preferably *not* logged in files), various computations,
> aggregations, reports, and APIs for exposing that data can be  
> generated
> later. So we may want to focus initially on what data to record and  
> plan
> for a statistics data model, database tables, and recording to be  
> built
> into DSpace 2.0.
>
> At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote:
>> On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote:
>>> 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
>> [snip]
>>> This is such an interesting statement that I think I will make it  
>>> next
>>> week's topic! What *is* excellent document repository software? I  
>>> have
>>> a feeling that the non-developer community may have a rather  
>>> different
>>> take on it from most developers... we'll see if I'm right.
>>
>> I think you are, and I look forward to that discussion!
>>
>>>> This is one reason why I think that it should be as easy as  
>>>> possible
>>>> for multiple stat. projects to tap into built-in streams of
>>>> observations.  Different sites have different needs, and I think we
>>>> need to be able to easily play with various ways of doing stat.s.
>>>
>>> Agreed, but just to toss this out: I foresee a countervailing  
>>> pressure
>>> in future toward standardized and aggregated statistics across
>>> repositories. I have heard a number of statements to the effect that
>>> faculty are using download counts from disciplinary repositories in
>>> tenure-and-promotion packages. As their work becomes scattered  
>>> and/or
>>> duplicated across various repositories, they're going to want to
>>> aggregate that information.
>>
>> Quite so.  I just don't feel that we've yet got to the point at which
>> we understand how to do that well.  A lot of good solutions come  
>> about
>> in this way: an abstract and somewhat indistinct common need is
>> recognized; a number of people all go off in different directions and
>> try things; solutions are compared, borrow from each other, coalesce;
>> finally a now well-understood need finds itself fulfilled with one or
>> two mature implementations.  I feel that we're still deep in the "try
>> things" phase.
>>
>> The degree to which statistics are desired and used suggests that, in
>> addition to traditional reports, we should be thinking in terms of
>> exposing statistical products in machine-readable form.  I have been
>> thinking for some time that we might, with reasonable effort, help to
>> work out a lingua franca for exchanging usage statistics among
>> repositories of various "brands" so that the utility of various  
>> ideas,
>> and the behavior of repository users, might be studied more
>> effectively.  But again, what we can all agree on will very likely be
>> a small subset of what we can individually envision.
>>
>> This really ought to be considered early-on, because if we can  
>> come up
>> with a common theme in the abstract, then machine- and human-readable
>> reporting become side-by-side layers on top of the pool of  
>> statistical
>> data products, and both will be easier to think about if they are
>> merely formatting something already produced.  Likewise the  
>> production
>> of those stat.s will be easier to think about if presentation issues
>> can be separated from the task.
>>
>> I do *not* mean to say here that the statistics that people want now
>> should have to wait indefinitely on some Grand Scheme to do it all.
>> It would be better to organize the development in successive
>> approximations if it looks like taking too long to do it all in one
>> push.  It's probably going to take several years to fully realize
>> satisfactory monitoring and reporting of DSpace usage, but that
>> doesn't mean that we can't provide better and better approximations
>> much sooner.
>>
>> --
>> Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
>> Typically when a software vendor says that a product is  
>> "intuitive" he
>> means the exact opposite.
>>
>>
>> _______________________________________________
>> Dspace-general mailing list
>> Dspace-general at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/dspace-general
>
>
> Randy Stern
> Manager of Systems Development
> Harvard University Library Office for Information Systems
> 90 Mount Auburn Street
> Cambridge, MA 02138
> Tel. +1 (617) 495-3724
> Email <randy_stern at harvard.edu>
>
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general


From hussein at cs.uct.ac.za  Thu Aug 28 04:39:21 2008
From: hussein at cs.uct.ac.za (Hussein Suleman)
Date: Thu, 28 Aug 2008 10:39:21 +0200
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu>
References: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>	<356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>	<20080825145520.GF15124@IUPUI.Edu>	<356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>	<20080826203433.GB20164@IUPUI.Edu>	<356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
	<5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu>
Message-ID: <48B66439.2010008@cs.uct.ac.za>

without getting into whether event streams should be logged to file or 
database, this is probably in general the way to go. though i would 
recommend that this is done on a broader scale so analysis tools are 
interoperable among the major repository software systems.

(there was some research on an XML log file format a while back but it 
did not go far)

ttfn,
----hussein

=====================================================================
hussein suleman ~ hussein at cs.uct.ac.za ~ http://www.husseinsspace.com
=====================================================================


Randy Stern wrote:
> One useful distinction is to separate to some degree the statistics that we 
> may want to calculate from the events/raw data that needs to be recorded by 
> the DSpace system as it operates. As long as the events are recorded in the 
> database (preferably *not* logged in files), various computations, 
> aggregations, reports, and APIs for exposing that data can be generated 
> later. So we may want to focus initially on what data to record and plan 
> for a statistics data model, database tables, and recording to be built 
> into DSpace 2.0.
> 
> At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote:
>> On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote:
>>> 2008/8/26 Mark H. Wood <mwood at iupui.edu>:
>> [snip]
>>> This is such an interesting statement that I think I will make it next
>>> week's topic! What *is* excellent document repository software? I have
>>> a feeling that the non-developer community may have a rather different
>>> take on it from most developers... we'll see if I'm right.
>> I think you are, and I look forward to that discussion!
>>
>>>> This is one reason why I think that it should be as easy as possible
>>>> for multiple stat. projects to tap into built-in streams of
>>>> observations.  Different sites have different needs, and I think we
>>>> need to be able to easily play with various ways of doing stat.s.
>>> Agreed, but just to toss this out: I foresee a countervailing pressure
>>> in future toward standardized and aggregated statistics across
>>> repositories. I have heard a number of statements to the effect that
>>> faculty are using download counts from disciplinary repositories in
>>> tenure-and-promotion packages. As their work becomes scattered and/or
>>> duplicated across various repositories, they're going to want to
>>> aggregate that information.
>> Quite so.  I just don't feel that we've yet got to the point at which
>> we understand how to do that well.  A lot of good solutions come about
>> in this way: an abstract and somewhat indistinct common need is
>> recognized; a number of people all go off in different directions and
>> try things; solutions are compared, borrow from each other, coalesce;
>> finally a now well-understood need finds itself fulfilled with one or
>> two mature implementations.  I feel that we're still deep in the "try
>> things" phase.
>>
>> The degree to which statistics are desired and used suggests that, in
>> addition to traditional reports, we should be thinking in terms of
>> exposing statistical products in machine-readable form.  I have been
>> thinking for some time that we might, with reasonable effort, help to
>> work out a lingua franca for exchanging usage statistics among
>> repositories of various "brands" so that the utility of various ideas,
>> and the behavior of repository users, might be studied more
>> effectively.  But again, what we can all agree on will very likely be
>> a small subset of what we can individually envision.
>>
>> This really ought to be considered early-on, because if we can come up
>> with a common theme in the abstract, then machine- and human-readable
>> reporting become side-by-side layers on top of the pool of statistical
>> data products, and both will be easier to think about if they are
>> merely formatting something already produced.  Likewise the production
>> of those stat.s will be easier to think about if presentation issues
>> can be separated from the task.
>>
>> I do *not* mean to say here that the statistics that people want now
>> should have to wait indefinitely on some Grand Scheme to do it all.
>> It would be better to organize the development in successive
>> approximations if it looks like taking too long to do it all in one
>> push.  It's probably going to take several years to fully realize
>> satisfactory monitoring and reporting of DSpace usage, but that
>> doesn't mean that we can't provide better and better approximations
>> much sooner.
>>
>> --
>> Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
>> Typically when a software vendor says that a product is "intuitive" he
>> means the exact opposite.
>>
>>
>> _______________________________________________
>> Dspace-general mailing list
>> Dspace-general at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/dspace-general
> 
> 
> Randy Stern
> Manager of Systems Development
> Harvard University Library Office for Information Systems
> 90 Mount Auburn Street
> Cambridge, MA 02138
> Tel. +1 (617) 495-3724
> Email <randy_stern at harvard.edu>
> 
> 
> _______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general


From mwood at IUPUI.Edu  Thu Aug 28 09:04:54 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Thu, 28 Aug 2008 09:04:54 -0400
Subject: [Dspace-general] Statistics
In-Reply-To: <0K6A00LNO6YZKG90@messaging1.anu.edu.au>
References: <mailman.87902.1219825858.4453.dspace-general@mit.edu>
	<0K6A00LNO6YZKG90@messaging1.anu.edu.au>
Message-ID: <20080828130454.GA11845@IUPUI.Edu>

On Thu, Aug 28, 2008 at 08:40:11AM +1000, Scott Yeadon wrote:
> While jumping ahead a bit and not completely relevant to the context of 
> this discussion, it's important in any solution to separate out event 
> capture and statistics. Web server level statistics will only get you so 
> far. Having recently been through an exercise in building a prototype 
> statistics aggregator, the fundamentals in producing "good" statistics 
> (i.e. the reported information) is the *targetted capture* of events 
> (i.e. the raw event data) typically by the application (i.e. in the 
> DSpace code). We found the majority of reports which people want (or 
> rather the accuracy and granularity thereof) can only be provided where 
> the application has captured the event information rather than the more 
> general-level web container app. If you couple the DSpace 1.5.x event 
> producer/consumer feature with something like the De Minho front-end or 
> a Manakin stats aspect, that would make a pretty neat default stats package.

I agree. :-)  A start on that:

  http://sourceforge.net/tracker/index.php?func=detail&aid=2025998&group_id=19984&atid=319984

The Event System seems focused on changes to the repository, and I
recall that there was some resistance to expanding it to cover
references that don't change the model.  The above is a separate event
mechanism focused on reference events.

I've made considerable progress on adapting the University of
Rochester statistics package to take cases from this UsageEvent stream
instead of custom patching, and an XMLUI Aspect to make the resulting
per-object stat.s available for theming, but it's not quite ready for
daylight yet.

It's my understanding that the Minho package is one of those which
take cases from periodic digestion of log files.  Once an event stream
is available, it should be simple to create an adapter which appends
event records to a file in a suitable format, without clutter and with
the data you need.  The above patch demonstrates this with a plugin
which appends to a simple XML-like file.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080828/9edd1894/attachment.bin

From mwood at IUPUI.Edu  Thu Aug 28 09:16:06 2008
From: mwood at IUPUI.Edu (Mark H. Wood)
Date: Thu, 28 Aug 2008 09:16:06 -0400
Subject: [Dspace-general] Week 2: Statistics
In-Reply-To: <48B66439.2010008@cs.uct.ac.za>
References: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
	<356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com>
	<356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com>
	<20080825145520.GF15124@IUPUI.Edu>
	<356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com>
	<20080826203433.GB20164@IUPUI.Edu>
	<356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com>
	<5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu>
	<48B66439.2010008@cs.uct.ac.za>
Message-ID: <20080828131606.GB11845@IUPUI.Edu>

On Thu, Aug 28, 2008 at 10:39:21AM +0200, Hussein Suleman wrote:
> without getting into whether event streams should be logged to file or 
> database, this is probably in general the way to go. though i would 
> recommend that this is done on a broader scale so analysis tools are 
> interoperable among the major repository software systems.
> 
> (there was some research on an XML log file format a while back but it 
> did not go far)

One difficulty with logging to XML is that, strictly speaking, it's
not possible.  The document element cannot reliably be closed by the
logging application.

Practically, it should be simple to close the document element after
log cutoff by just pasting the closing tag onto the end of the file
before ingestion, but it's a minor weakness of the XML approach.

I agree that flat-file representations of usage event data should be
designed for general usability by a variety of tools.

-- 
Mark H. Wood, Lead System Programmer   mwood at IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080828/46b3dec2/attachment.bin

From dsalo at library.wisc.edu  Thu Aug 28 13:06:06 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Thu, 28 Aug 2008 12:06:06 -0500
Subject: [Dspace-general] Chat summary: 27 August 2008
Message-ID: <356cf3980808281006m22ebf18ch53ac9ec90a09252a@mail.gmail.com>

We had some new voices this time! Good to see.

LINKS AND DEMOS

* ePrints item report sample: <http://eprints.rclis.org/stat/6766.html>
* Minho report sample:
<http://repositorium.sdum.uminho.pt/stats?level=item&type=access&page=downviews-series&object-id=1822/6177>
* IRStats sample: <http://irstats.eprints.org/irstats-cadair>
* AWStats over an entire DSpace repository:
<http://researchspace.csir.co.za/awstats/awstats.pl?config=researchspace.csir.co.za&configdir=/etc/awstats/>
(thanks to Ina Smith of the University of Pretoria, who could not be
present on the chat, for emailing this link)
* "Top downloads" and all-repository statistics on the home page:
<http://www.ideals.uiuc.edu/>

APPLAUSE

The chat took a moment to applaud the Repository Support Project's new
DSpace Course (<http://hdl.handle.net/2160/615>). Contributors Stuart
Lewis, Chris Yates, and Claudia Jurgen were all present on the chat.
The Course is looking for new contributors, particularly with regard
to Manakin/XMLUI; if you can help, please contact Stuart Lewis.

AN IMPORTANT DISTINCTION

Mark Wood pointed out (as have several emails to the list during this
week's discussion) that two sharply differing concepts lurk behind the
word "statistics": the capture of repository events as they occur, and
the distillation of raw event data into useful reports. "Statistics
pull patterns out of collections of individual cases," said Mark.
Moreover, not all reports are statistical in nature; some (such as
"what's been deposited recently" lists) just regurgitate part of the
event stream.

Given accessible event-stream data, many statistical analyses can be
done wholly outside of DSpace, and it is unrealistic to expect DSpace
to create analyses for every imaginable use-case. Some common
use-cases, however, may need to become part of DSpace proper; the
trouble is defining them.

COMMON REPORTING NEEDS

All access-related reports (accesses/downloads) should filter out as
many crawlers as feasible.

* item accesses, total as well as by month and year
* bitstream downloads, as above
* accesses and downloads by author, as above; authors also want to
know what their most popular items are
* incoming links from other websites (via referrers; note that
referrer spam may become a problem)

Other possibilities mentioned included:

* alerts for download "spikes" over a short period of time
* on item pages, time of last download
* "popular items in this repository" (recent, total, and monthly,
though it was noted that displaying this information to end-users
tends to feed unjust power-law distribution of downloads)

Geolocating accesses was not perceived as vital.

PRIVACY ISSUES

Claudia Jurgen noted that the EU has very strict privacy laws that may
prevent collecting or retention of information that may identify
individual persons. DSpace may therefore not be able to track
individuals' site behavior (to put toward "more like this" links or
the like).

OTHER DESIDERATA

Technical issues: The widely-praised Minho stats engine does not yet
work with XMLUI, and no one on the chat knew of plans to adapt it.
Mark Diggory noted that event-capture should be separated from log4j's
error capturing.

Shane Beers pointed out that DSpace does not currently offer
repository managers much information about the contents of their
repositories, which is a significant worry vis-a-vis bitstream
preservation. A list of bitstreams by MIME type would be a start.

DSpace also does not help managers investigate deposit patterns and
growth. A readily-accessible list of recent deposits as well as a list
of deposits per time period (separable by community/collection, so
that different communities can be usefully compared) would be useful
to repository administrators, and should be relatively easy to build
via dc.date.available (or for research-tracking use-cases,
dc.date.published) metadata.

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From cwbailey at digital-scholarship.com  Fri Aug 29 11:21:33 2008
From: cwbailey at digital-scholarship.com (Charles W. Bailey, Jr.)
Date: Fri, 29 Aug 2008 10:21:33 -0500
Subject: [Dspace-general] How Many University/Health Library Institutional
 Repositories Are There in Texas?
Message-ID: <48B813FD.3080402@digital-scholarship.com>

Texas is the second largest state in the U.S., both in terms
of population (about 23.9 million) and square miles (268,820
square miles).  It has 74 universities and 10 health-related
academic institutions.

How many library institutional repositories serve these
universities and health-related academic institutions? Based
on major repository directories and key vendor lists, it
appears that the answer is eight, with Digital Commons and
DSpace being the software of choice. (Institutional
repositories being those repositories that serve one or more
entire institutions.)

Texas also has the Texas Digital Library.

Read more about it at "Institutional Repositories at Texas
University and Health Science Libraries":

http://tinyurl.com/6fanuq

-- 

Best Regards,
Charles

Charles W. Bailey, Jr.
Publisher, Digital Scholarship
http://www.digital-scholarship.org/

A Look Back at Nineteen Years
as an Internet Digital Publisher
http://www.digital-scholarship.org/cwb/nineteenyears.htm


From mcgeetho at shu.edu  Fri Aug 29 13:01:28 2008
From: mcgeetho at shu.edu (Thomas A McGee)
Date: Fri, 29 Aug 2008 13:01:28 -0400
Subject: [Dspace-general] Statistics
In-Reply-To: <mailman.339.1219939517.27180.dspace-general@mit.edu>
Message-ID: <OFB298D123.2860D9C2-ON852574B4.005CD073-852574B4.005D84A2@shu.edu>

I missed the chat the other day, so some of this may have been covered and 
dismissed already. 

Tomcat has the capacity to output Apache-style "combined" log files for 
all requests, including bitstreams. There's a whole host of commercial, 
shareware and freeware products out there designed to slice-and-dice these 
Apache log files and pull out all the kinds of reports people seem to be 
talking about here. 

The programs range from the very simple, like Analog, to the extremely 
complex and expensive, like WebTrends Enterprise. They can be configured 
to download the log files automatically and run reports on a schedule, so 
that they're there when you come in in the morning. They can incorporate 
various filters, resolve user IP addresses, analyze request URL paths 
(which can be translated into collection and community names), referers, 
logged-in users, user agents, etc. etc.

Rather than reinvent the wheel (and this is an extremely complex wheel),I 
think for most users it would pay to look at this approach unless there is 
something really esoteric about your traffic that you are trying to get 
at. 

_____________________
Tom McGee
Seton Hall University TLTC
973 761 9000 x5021
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080829/66b91904/attachment.htm

From mdiggory at MIT.EDU  Fri Aug 29 13:32:33 2008
From: mdiggory at MIT.EDU (Mark Diggory)
Date: Fri, 29 Aug 2008 10:32:33 -0700
Subject: [Dspace-general] Statistics
In-Reply-To: <OFB298D123.2860D9C2-ON852574B4.005CD073-852574B4.005D84A2@shu.edu>
References: <OFB298D123.2860D9C2-ON852574B4.005CD073-852574B4.005D84A2@shu.edu>
Message-ID: <DDD15191-9D12-4895-8BEC-237FBCC81BC4@mit.edu>

Thomas,

Thanks for what is also a sensible recommendation.

On Aug 29, 2008, at 10:01 AM, Thomas A McGee wrote:

>
> I missed the chat the other day, so some of this may have been  
> covered and dismissed already.
>
> Tomcat has the capacity to output Apache-style "combined" log files  
> for all requests, including bitstreams. There's a whole host of  
> commercial, shareware and freeware products out there designed to  
> slice-and-dice these Apache log files and pull out all the kinds of  
> reports people seem to be talking about here.
>
> The programs range from the very simple, like Analog, to the  
> extremely complex and expensive, like WebTrends Enterprise. They  
> can be configured to download the log files automatically and run  
> reports on a schedule, so that they're there when you come in in  
> the morning. They can incorporate various filters, resolve user IP  
> addresses, analyze request URL paths (which can be translated into  
> collection and community names), referers, logged-in users, user  
> agents, etc. etc.
>
> Rather than reinvent the wheel (and this is an extremely complex  
> wheel),I think for most users it would pay to look at this approach  
> unless there is something really esoteric about your traffic that  
> you are trying to get at.


Its an inherent issue in the the "address space" of DSpace resources  
made available in the web-application. For instance. I may have the  
following Community, Collection and Item

Computer Science and Artificial Intelligence Lab (CSAIL)
http://dspace.mit.edu/handle/1721.1/5458

CSAIL Technical Reports (July 1, 2003 - present)
http://dspace.mit.edu/handle/1721.1/29807

Adaptive Envelope MDPs for Relational Equivalence-based Planning
http://dspace.mit.edu/handle/1721.1/41920

Via the perception of the Apache/Tomcat logs Requests to these  
resources are made and based on those logs its quite difficult to  
ascertain that there is a hierarchy here:

/1721.1/5458 <-- Community
       /1721.1/29807 <-- Collection
               /1721.1/41920 <-- Item

The challenge is that most logging packages given the lack of the  
above structure being absent in the path of the resource, cannot roll  
up the statistics to represent the aggregations at the collection and  
item level that Managers want to see for a DSpace Community/Collection.

Likewise, we are in a situation where we are trying to maintain

1.) Not introducing a ridged expectation that "paths" for which  
resources are represented can not change over time as dspace evolves
2.) That we may have more than one path for which a resource is  
accessed, and may want to either treat those accesses as "the same"  
or treat them as "uniquely different" statistically.
3.) That we want to allow hooks so that these stats can be collected  
off the "logical event" in DSpace rather than the "physical event" in  
the application server.

By configuring a stats solution like analog/awstats/webtrends, we are  
restricted to only gathering statistics about the physical event of  
requesting that address in the web service. And likewise, if that  
address representing that resource changes in UI (either via  
development decisions or administrative decisions) then that  
configuration of that external software will be out of sync and need  
to be adjusted.  By having the application report "logical events" we  
can step away from this issue. By internalizing the statistics  
gathering and generation, we have an opportunity to create a solution  
that can allow DSpace to freely evolve and  solution that will meet  
the requirements requested by the community (or more explicitly,  
exhibited by the Minho addon).

Cheers,
Mark

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080829/7cf2a0ea/attachment.htm

From gsk5 at cornell.edu  Fri Aug 29 15:31:20 2008
From: gsk5 at cornell.edu (George Stanley Kozak)
Date: Fri, 29 Aug 2008 15:31:20 -0400 (EDT)
Subject: [Dspace-general] Statistics
In-Reply-To: <OFB298D123.2860D9C2-ON852574B4.005CD073-852574B4.005D84A2@shu.edu>
References: <OFB298D123.2860D9C2-ON852574B4.005CD073-852574B4.005D84A2@shu.edu>
Message-ID: <3143.67.241.39.12.1220038280.squirrel@webmail.cornell.edu>

Tom:

I missed the chat, too, but I have been one of the ones asking for a more
integrated statistics package.  I used to use NetTracker for my DSpace
instance and now am using a combination of the Edinburgh software and some
locally grown software.  I then have a page that lists the top 10 hits and
other stats.

At one time we hade a counter on the item page that kept track of views,
but this local code of ours broke when we went to DSpace 1.4.2.

My users are asking for specific information concerning their partcilular
item(s) or collection(s) and they'd like to see it on the item or
collection page.  I tried to use the Minho package, but have had problems
getting it to work in my instance.

So, my thinking has been that if DSpace had an integrated package (maybe
something that acts like the Minho software), then I would be able to give
the users what they want.  So, in my case, the free and commerical
packages while giving me useful information, doesn't give my users what
they want to see.  To fix that I would have to do some programming and my
experience in changing DSpace software these past several years is that "I
really don't want to do that!" ;-)

So, that's my logic behind asking for an integrated stats package for
DSpace (Yes, I know it's selfish!).

> I missed the chat the other day, so some of this may have been covered and
> dismissed already.
>
> Tomcat has the capacity to output Apache-style "combined" log files for
> all requests, including bitstreams. There's a whole host of commercial,
> shareware and freeware products out there designed to slice-and-dice these
> Apache log files and pull out all the kinds of reports people seem to be
> talking about here.
>
> The programs range from the very simple, like Analog, to the extremely
> complex and expensive, like WebTrends Enterprise. They can be configured
> to download the log files automatically and run reports on a schedule, so
> that they're there when you come in in the morning. They can incorporate
> various filters, resolve user IP addresses, analyze request URL paths
> (which can be translated into collection and community names), referers,
> logged-in users, user agents, etc. etc.
>
> Rather than reinvent the wheel (and this is an extremely complex wheel),I
> think for most users it would pay to look at this approach unless there is
> something really esoteric about your traffic that you are trying to get
> at.
>
> _____________________
> Tom McGee
> Seton Hall University TLTC
> 973 761 9000 x5021_______________________________________________
> Dspace-general mailing list
> Dspace-general at mit.edu
> http://mailman.mit.edu/mailman/listinfo/dspace-general
>


****************************************
George Kozak
Coordinator
Web Development and Management
Digital Media Group
501 Olin Library
Cornell University 14853
gsk5 at cornell.edu
607-255-8924


From dsalo at library.wisc.edu  Fri Aug 29 15:35:37 2008
From: dsalo at library.wisc.edu (Dorothea Salo)
Date: Fri, 29 Aug 2008 14:35:37 -0500
Subject: [Dspace-general] Statistics
In-Reply-To: <3143.67.241.39.12.1220038280.squirrel@webmail.cornell.edu>
References: <OFB298D123.2860D9C2-ON852574B4.005CD073-852574B4.005D84A2@shu.edu>
	<3143.67.241.39.12.1220038280.squirrel@webmail.cornell.edu>
Message-ID: <356cf3980808291235k2d4e595fv1d91ffb296227cfb@mail.gmail.com>

On Fri, Aug 29, 2008 at 2:31 PM, George Stanley Kozak <gsk5 at cornell.edu> wrote:
> Tom:

> So, that's my logic behind asking for an integrated stats package for
> DSpace (Yes, I know it's selfish!).

What is selfish about doing your best to give the people you are
serving what they are asking your service to provide?

All right, aside from actually being able to keep your job, and all... ;)

Dorothea

-- 
Dorothea Salo dsalo at library.wisc.edu
Digital Repository Librarian AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493


From sil.linguist at gmail.com  Fri Aug 29 17:25:59 2008
From: sil.linguist at gmail.com (Hugh Paterson III)
Date: Fri, 29 Aug 2008 16:25:59 -0500
Subject: [Dspace-general] Dspace Mysql
Message-ID: <967E9449-58DB-446C-8E82-60B329DB93D8@gmail.com>

I am new to d-space and was wondering if anyone has implemented dspace  
with a MySQL back end?  I was looking over the documentation to see if  
there were any suggestions but there appears to be no official  
recommendation for MySQL.


  any help out there?