From jfurfey at mbl.edu Fri Aug 1 14:45:17 2008 From: jfurfey at mbl.edu (John Furfey) Date: Fri, 1 Aug 2008 14:45:17 -0400 Subject: [Dspace-general] migrate data to 1.5 Message-ID: We're in the process of upgrading from 1.4.2 to 1.5, and we're also moving to a new server. We've got 1.5 up and running and we're trying to figure out the best way of migrating our data. Is it possible to do a pg_dump from the 1.4.2 server and do a pg_restore on the 1.5 server? Or will 1.5's new db schema prevent this? Thanks for any response, I have not been able to find any documentation for this scenario. ------------------------------------------------------------ John Furfey Digital Systems and Services Coordinator MBLWHOI Library Woods Hole MA 02543 USA PHONE: 508-289-7435 EMAIL: jfurfey at mbl.edu http://www.mblwhoilibrary.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080801/60efe5ad/attachment.htm From mdiggory at MIT.EDU Sun Aug 3 00:12:17 2008 From: mdiggory at MIT.EDU (Mark Diggory) Date: Sat, 2 Aug 2008 21:12:17 -0700 Subject: [Dspace-general] [Dspace-tech] migrate data to 1.5 In-Reply-To: References: Message-ID: <7B27116C-C40A-41D0-9445-11190CB4D2B3@mit.edu> John, Please follow the upgrade instructions supplied for upgrading DSpace 1.4.2 to 1.5 (page 50). The upgrade process described in the documentation will take you through the appropriate steps to convert the DSpace postgres database from 1.4.2 to 1.5 http://www.dspace.org/images/onepointfivedocs/dspacemanual_15_may.zip On Aug 1, 2008, at 11:45 AM, John Furfey wrote: > We're in the process of upgrading from 1.4.2 to 1.5, and we're also > moving to a new server. > > We've got 1.5 up and running and we're trying to figure out the > best way of migrating our data. Is it possible to do a pg_dump > from the 1.4.2 server and do a pg_restore on the 1.5 server? Or > will 1.5's new db schema prevent this? You do not want to attempt to do it in this order. The upgrade process supplies a SQL script (database_schema_14-15.sql) to make the necessary changes to your existing database to upgrade from 1.4.2 to 1.5, you do not need to do a fresh install of an empty DSpace instance and migrate your data into it. I also highly recommend using dp_dump/psql to create a copy of your database and install a replica of your dspace installation on another machine to properly "test" that the upgrade process will work successfully for your product server before attempting it there. This will also give you an opportunity to become familiar with the upgrade process before doing it against a mission critical instance. To backup a postgres database instance on linux the we use the following command/options > pg_dump --oids -U dspace -f dspace-backup.sql [dspace-db-name] Where [dspace-db-name] is the name of your dspace database in the postgres cluster (usually this is "dspace" by default). To restore the backup to the same location, > psql -U dspace -d [dspace-db-name] < dspace-backup.sql or to the same name on another machine where you do not already have the database or dspace user created, you would do. > createuser -U postgres -d -A -P dspace > createdb -U dspace -E UNICODE [dspace-db-name] > psql -U dspace -d [dspace-db-name] < dspace-backup.sql > Thanks for any response, I have not been able to find any > documentation for this scenario. Certainly do feel free to post any questions about how to handle your upgrade properly with the dspace-tech list. We in the community who have worked on creating this upgrade process would like to assure your switch to 1.5.0 is a success. -Mark ~~~~~~~~~~~~~ Mark R. Diggory - DSpace Developer and Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology Home Page: http://purl.org/net/mdiggory/homepage -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080802/2b4f5aab/attachment.htm From sunilgoria at yahoo.com Mon Aug 4 06:00:51 2008 From: sunilgoria at yahoo.com (Sunil Goria) Date: Mon, 4 Aug 2008 03:00:51 -0700 (PDT) Subject: [Dspace-general] File upload problem Message-ID: <701332.35218.qm@web53411.mail.re2.yahoo.com> Dear all We are using Dspace 1.2 on on Linux enterprise since last 3-4 years. Now we are unable to upload?file in Dspace server from client throug Internet explorer. Earlier it was warking fine. After re-insttaling broweser it is not uploading file. Though it is uplaoding file from?one system in our LAN. Please suggest me the browser setting or any other reason for this problem. ? with regards, Dr. Sunil Goria Assistant Librarian University Library, G.B. Pant University of Agriculture & Technology, Pantnagar-263145 (India) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080804/c87a92ea/attachment.htm From alally at u.washington.edu Mon Aug 4 12:02:47 2008 From: alally at u.washington.edu (Ann Lally) Date: Mon, 4 Aug 2008 09:02:47 -0700 Subject: [Dspace-general] selective searching Message-ID: <00b701c8f64b$89ca8470$9d5f8d50$@washington.edu> Hi all, The UW Libraries has been storing "library centric" digital files in our instance of DSpace, as well as some items that are locked by a particular community. We don't want these files to show when someone searches for academic papers and reports. Has anyone else had this issue? How did you resolve it? Thanks in advance. Ann Lally University of Washington Libraries -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080804/d3c0c971/attachment.htm From mdiggory at MIT.EDU Wed Aug 6 13:19:37 2008 From: mdiggory at MIT.EDU (Mark Diggory) Date: Wed, 6 Aug 2008 10:19:37 -0700 Subject: [Dspace-general] DSpace@MIT upgrades to DSpace 1.5.x Message-ID: <2465CAED-77ED-4B95-B8A6-8D94BC5018CA@mit.edu> Dear DSpace Community, Earlier this week we completed the upgrade of DSpace at MIT from DSpace 1.4.1 (JSPUI) to the latest DSpace 1.5.x revision. Likewise, we switched over completely to use the Texas A&M Manakin based XMLUI during this upgrade process. The service can now be explored at original DSpace at MIT host. http://dspace.mit.edu For MIT Libraries, this upgrade represents the culmination of more than a years worth of effort done in collaboration with other individuals and organizations within the DSpace community, beginning with the reorganization of the DSpace 1.4.2 code-base, the establishment of the Maven based build process, and culminating in the in the release of DSpace 1.5.0, the addition of maintenance fixes and the upgrade of our systems. We at MIT feel that the DSpace 1.5.X branch, which now contains a significant load of bug fixes, is now prepared for a maintenance release. As release coordinator on the 1.5.1 release, I expect to now begin testing the release of a 1.5.1 beta update in the coming week. I would like to thank all the developers and community members who contributed to the DSpace 1.5.X codebase in the past year. We could have not have accomplished our own production goals without your efforts within the community. Cheers, Mark Diggory ~~~~~~~~~~~~~ Mark R. Diggory - DSpace Developer and Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology Home Page: http://purl.org/net/mdiggory/homepage From dsalo at library.wisc.edu Wed Aug 6 16:07:44 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Wed, 6 Aug 2008 15:07:44 -0500 Subject: [Dspace-general] DSpace development priorities: starting a discussion Message-ID: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com> Greetings, DSpace community, For some time, I've been concerned that the DSpace development process hasn't enjoyed as much input from the broader community as would be desirable. The voices of less-technical repository managers and other staff associated with DSpace repositories have been particularly difficult to attract to the discussion. I'm hoping to gather impressions and suggestions from this specific segment of the community (though others are welcome as well!) to pass on to DSpace developers. With any luck, this process will build a stronger connection between developers and repository managers going forward. The DSpace development-priority survey done in 2007 was valuable and worthwhile, and if possible, I'd like to revisit some of the questions raised there. I'd also like to start "in your own words" discussions about what repository managers want and need from DSpace that it isn't yet providing. We can certainly talk here, and I welcome that! More than one DSpace developer has agreed to monitor these discussions, and I will be summarizing them back to the development list. But I'm completely open to other venues as well -- IM, group chat, Web 2.0, Skype, out-of-band email -- depending on what people tell me they want. So. How would you like to do this? Once we've sorted out the process, we can get down to business. Feel free to contact me off-list if you prefer. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From val at dspace.org Wed Aug 6 22:37:53 2008 From: val at dspace.org (Valorie Hollister) Date: Wed, 06 Aug 2008 21:37:53 -0500 Subject: [Dspace-general] DSpace development priorities: starting a discussion Message-ID: <20080806213753.wa7jq9m9og884k80@www.dspace.org> Dorothea - I believe this effort would be very worthwhile and consistent with many of the upcoming initiatives the Foundation has been working on (DSpace Global Outreach Cmte, implementation of Jira for feature requests/tracking, DSpace repository manager meeting at SPARC in November, and discussion forums on www.dspace.org, etc). I would very much like to be involved in the discussions you are suggesting and look forward to hearing from the DSpace community. Valorie Hollister Community Outreach Manager DSpace Foundation Greetings, DSpace community, For some time, I've been concerned that the DSpace development process hasn't enjoyed as much input from the broader community as would be desirable. The voices of less-technical repository managers and other staff associated with DSpace repositories have been particularly difficult to attract to the discussion. I'm hoping to gather impressions and suggestions from this specific segment of the community (though others are welcome as well!) to pass on to DSpace developers. With any luck, this process will build a stronger connection between developers and repository managers going forward. The DSpace development-priority survey done in 2007 was valuable and worthwhile, and if possible, I'd like to revisit some of the questions raised there. I'd also like to start "in your own words" discussions about what repository managers want and need from DSpace that it isn't yet providing. We can certainly talk here, and I welcome that! More than one DSpace developer has agreed to monitor these discussions, and I will be summarizing them back to the development list. But I'm completely open to other venues as well -- IM, group chat, Web 2.0, Skype, out-of-band email -- depending on what people tell me they want. So. How would you like to do this? Once we've sorted out the process, we can get down to business. Feel free to contact me off-list if you prefer. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 _______________________________________________ Dspace-general mailing list Dspace-general at mit.edu http://mailman.mit.edu/mailman/listinfo/dspace-general ----- End forwarded message ----- Valorie Hollister Community Outreach Manager DSpace Foundation From mwood at IUPUI.Edu Thu Aug 7 08:44:42 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Thu, 7 Aug 2008 08:44:42 -0400 Subject: [Dspace-general] DSpace development priorities: starting a discussion In-Reply-To: <20080806213753.wa7jq9m9og884k80@www.dspace.org> References: <20080806213753.wa7jq9m9og884k80@www.dspace.org> Message-ID: <20080807124442.GB2968@IUPUI.Edu> On Wed, Aug 06, 2008 at 09:37:53PM -0500, Valorie Hollister wrote: > implementation of Jira for feature requests/tracking Well, there's a communication opportunity right there. This is the first I'd heard of setting up another tracker system. We already have trackers full of items at SourceForge. Will those items be migrated? -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080807/0b671921/attachment.bin From christophe.dupriez at destin.be Thu Aug 7 09:02:25 2008 From: christophe.dupriez at destin.be (Christophe Dupriez) Date: Thu, 07 Aug 2008 15:02:25 +0200 Subject: [Dspace-general] DSpace development priorities: starting a discussion In-Reply-To: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com> References: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com> Message-ID: <489AF261.70200@destin.be> Dear Dorothea, (to the DSpace Community) Thank you so much for your long needed initiative. I am using DSpace for different customers who are paying me to adapt DSpace to their needs. I am very lucky to work with those institutions who trust me and provide me interesting challenges. I am taking great advantages of the DSpace project and return much too few contributions to the community. I previously made suggestions to improve this (conclusion of the following paper) http://www.aepic.it/conf/viewpaper.php?id=197&cf=11 For me, big institutions, universities, research networks have the resources (money, people, organisation) to get what they want from DSpace source code. They can request what they need from their developers and may (or not) encourage their developers to take the time to contribute back to the DSpace community. From my point of view, this process is not very efficient and is rapidly seen as not profitable enough by most managers. If one desires that other institutions (less money, less people, less organisation skills available) to be able to publish their intellectual production using DSpace, I believe more **coordinated** efforts should be allocated to create a "standard" DSpace flagship that NO developer have to customize locally. If we remove from our mind that local institutions can "always" develop their adaptations, we would look at the project more cautiously and possibly put back the users where they must be: in the driver seat. One cultural problem we may have: Open source developers enjoy freedom and protect it by opposing the "free software principles" to any criticism. "free software" means "somewhat free from the commercial empire", not free for all! Top-down processes must be in a right balance with bottom-up ones. Establishing generic institutional needs (a DSpace product definition) must be a structured project to succeed. A project begins when one identifies: 1) a global need, objective 2) project sponsors who approve important decisions 3) a knowledgeable project leader who animate, coordinate, mandate The proposal I would like to make: 1) Apply the 80/20 rule: Create an immediatley applicable DSpace package which answers to 80% of the needs of 80% of the smaller institutions which would be happy to not hire any developer (and keep their money to hire a very good application manager) to have an enthusiastic result **that the DSpace foundation would guarantee to sustain on the long term, always providing an easy upgrade path from one version to the next** 2) The sponsors should be institutions gathering to provide resources (money, people, organisation skills) to obtain this result in a reasonable time frame (18 months). The Foundation would coordinate this committe, animate the process with a democratic "1 participating institution = 1 vote" decision process 3) The project leader would be chosen using a "Call for Tender" process, with the final decision took by the sponsoring committee. IMHO, this is much more important than most radical restructuring of DSpace code base (like some of the ones currently envisaged). But it may trigger some other unforeseen radical technical decisions... Let see how things will evolve! Have a nice day! Christophe Dorothea Salo a ?crit : > Greetings, DSpace community, > > For some time, I've been concerned that the DSpace development process > hasn't enjoyed as much input from the broader community as would be > desirable. The voices of less-technical repository managers and other > staff associated with DSpace repositories have been particularly > difficult to attract to the discussion. I'm hoping to gather > impressions and suggestions from this specific segment of the > community (though others are welcome as well!) to pass on to DSpace > developers. With any luck, this process will build a stronger > connection between developers and repository managers going forward. > > The DSpace development-priority survey done in 2007 was valuable and > worthwhile, and if possible, I'd like to revisit some of the questions > raised there. I'd also like to start "in your own words" discussions > about what repository managers want and need from DSpace that it isn't > yet providing. > > We can certainly talk here, and I welcome that! More than one DSpace > developer has agreed to monitor these discussions, and I will be > summarizing them back to the development list. But I'm completely open > to other venues as well -- IM, group chat, Web 2.0, Skype, out-of-band > email -- depending on what people tell me they want. > > So. How would you like to do this? Once we've sorted out the process, > we can get down to business. Feel free to contact me off-list if you > prefer. > > Dorothea > > -------------- next part -------------- A non-text attachment was scrubbed... Name: christophe_dupriez.vcf Type: text/x-vcard Size: 454 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080807/0537d2ea/attachment.vcf From mwood at IUPUI.Edu Thu Aug 7 10:18:45 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Thu, 7 Aug 2008 10:18:45 -0400 Subject: [Dspace-general] DSpace development priorities: starting a discussion In-Reply-To: <489AF261.70200@destin.be> References: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com> <489AF261.70200@destin.be> Message-ID: <20080807141845.GD2968@IUPUI.Edu> Some things to keep in mind: o There is no commercial or employment relationship between end-users and developers here. Anybody who wants something done must accomplish it either by his own (or his organization's) labor, or by moral suasion -- appealing to the value of improving the commons, or the good feeling that comes from having done something well. The good news is that, *because* there is no mechanism of compulsion, moral suasion works tolerably well in such situaions. o On the other hand, there is most definitely an employment relationship between the developer and his own institution. If I want to work on some aspect of DSpace, I have to convince my supervisor that the work will benefit the institution sufficiently to account for the cost of my time. It's difficult, but not impossible, to sell intangible benefits like building up the commons, but it is much easier to sell features that we need ourselves. The result is that the needs of one's own institution *usually* come first. o Just expounding a need of your institution may cause someone elsewhere to realize, "hey, we could use that too -- and we have the resources to build it." So we do all need to talk about our needs and wishes, even if we can't realize them ourselves. o Code is not all there is. If your institution can't create code, could it contribute documentation or user-interface design? Could you volunteer to monitor a tracker and provide short monthly postings on item turnover, or moderate a task force, or maintain a most-popular-request list? o One of the most effective ways to poison a community project is to try to manage contributors as if you have some authority over them. They know better. Any community member (coder or not) who feels that his contributions are unappreciated has *nothing to lose* by ceasing to contribute, because the only reward for contribution is already denied him. Because the project is held in common, he can still work on it for those who *do* reward him. And a few questions: On Thu, Aug 07, 2008 at 03:02:25PM +0200, Christophe Dupriez wrote: > 1) Apply the 80/20 rule: Create an immediatley applicable DSpace package > which answers to 80% of the needs of 80% of the smaller institutions > which would be happy to not hire any developer (and keep their money to > hire a very good application manager) to have an enthusiastic result > **that the DSpace foundation would guarantee to sustain on the long > term, always providing an easy upgrade path from one version to the next** Has this not already been done? How do we know? What remains to be done in order to satisfy the 80%? > 2) The sponsors should be institutions gathering to provide resources > (money, people, organisation skills) to obtain this result in a > reasonable time frame (18 months). The Foundation would coordinate this > committe, animate the process with a democratic "1 participating > institution = 1 vote" decision process Doesn't this just entrench the plutocracy? Those lacking resources to support development have no vote. Did I misunderstand? And some suggested reading: _The Cathedral and the Bazaar_, by Eric S. Raymond. An exploration of the economics, psychology, and sociology of development by community. If you want to know how to motivate participants in a project like DSpace, or just understand why some of them behave so oddly, this is a good place to start. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080807/6c31023b/attachment.bin From mveve at utk.edu Thu Aug 7 16:11:27 2008 From: mveve at utk.edu (Veve, Marielle) Date: Thu, 7 Aug 2008 16:11:27 -0400 Subject: [Dspace-general] Survey: Catalogers working with Non-MARC Metadata Message-ID: From: Veve, Marielle Sent: Thursday, August 07, 2008 2:10 PM To: Veve, Marielle Subject: Survey: Catalogers working with Non-MARC Metadata To *all catalogers* (with or without MLS) in academic libraries: SURVEY: Integrating Non-MARC Metadata Production into the Duties of Traditional Catalogers You are invited to participate in a brief national, online survey. The objective of this survey is to research the national trends in the integration of Non-MARC metadata work into the duties of traditional catalogers and the perceptions and attitudes catalogers hold towards non-MARC metadata. For this study we would like to invite all catalogers in academic libraries, with or without MLS, who are involved in any aspect of non-MARC metadata work. I am asking you to please participate by answering this multiple choice survey. Your answers will be completely anonymous and confidential and will only be used to summarize information. *No* names or institution affiliation will be asked. Responding to the survey constitutes informed consent to participate in the research. The survey is voluntary, and you may withdraw from it at any time. It should take approximately 10 minutes to answer the 28 multiple choice questions of the survey. To complete the survey, follow this link http://www.surveymonkey.com/s.aspx?sm=b2XVTS5Z_2f5GV_2fXKUWTfyKw_3d_3d. The deadline to complete the survey is Sept.1, 2008. If you have questions at any time about the study or the procedures, you may contact the principal researcher, Marielle Veve; at Hodges Library, 1015 Volunteer Blvd., Knoxville, TN 37996; mveve at utk.edu. If you have questions about your rights as a participant, contact the Compliance Section at (423) 974-3466. Thank you in advance for assisting in this research project by taking the time to respond to the survey. This research project has been approved by the University of Tennessee's Institutional Review Board. -------- Marielle Veve Cataloging & Metadata Librarian Assistant Professor Hodges Library-University of Tennessee Knoxville, TN 37996 Phone: (865) 974-0394 E-mail: mveve at utk.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080807/bce3a1a6/attachment.htm From christophe.dupriez at destin.be Fri Aug 8 03:50:36 2008 From: christophe.dupriez at destin.be (Christophe Dupriez) Date: Fri, 08 Aug 2008 09:50:36 +0200 Subject: [Dspace-general] DSpace development priorities: starting a discussion In-Reply-To: <20080807141845.GD2968@IUPUI.Edu> References: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com> <489AF261.70200@destin.be> <20080807141845.GD2968@IUPUI.Edu> Message-ID: <489BFACC.2090908@destin.be> Hi Mark H., Terminology: * developers: IT specialists able to implement / modify DSpace * users: Information Management specialists able to manage a repository * end users: anybody able to understand documents contained in a repository If I follow your thoughts, we should set up some kind of collaborative process to define what is (should be) DSpace and fuel developers with precise needs definitions / development ideas. I would agree with you, but please remind: 1) developers have a very short term validation of their work (their local application works or not) that users do not have (you only know after months/years if your repository is succesfull or not). 2) developers have to formalize their thoughts in a very formal language. Users must formalize their projects in the language that will be best understood by their funding authorities or by their local end-users: cultural differences not always easy to share with DSpace community Just to explain that the cost/benefit equation of collaboration is not the same for users than for developers... Some "reward" of experience sharing must imbedded in DSpace community management. So the question, how do we set up / animate a collaborative process between DSpace users? In each institution, the users create documents to request funding, organise projects, define tasks of co-workers. Many of those documents are readily available on the Web. Some even on DSpace site. One may want to propose a frame to organize those documents and identify "blanks" to be filled. For instance: * What are the different kind of repositories (main missions)? I see (at least): 1) Institutional repositories: provide a permanent storage for the production of an institution 2) Subject repositories: provide reference documents on a given topic family, worldwide. Will need to interconnect with institutional repositories 3) National repositories: provide a reference storage of informations needed to fulfill a national legal obligations. Will need to interconnect with other National repositories 4) Local repositories: provide local knowledge workers with the reference documents they need for their daily work Personaly, I work on repository types 2 (WindMusic), 3(Dangerous Chemical Products sold in Belgium) and 4(Documents useful for PoisonCentre MDs) * What are the public of those different kind of repositories? What are the needs of those public? * What are the needs / missions of institutions organizing those repositories? * What are the strategy and the tactics those institutions would like to follow to make their repositories succesfull? * What cross-institutional content initiaves could multiply the impact of DSpace initiative (for instance, integration with OCLC services and WorldCat) ? * Priorities: What are their short term needs ? Longer term needs? * What are the use cases for DSpace? * What would be the ideal path for each use case? * How many steps does each use case involve today (if possible)? How many could be if DSpace would be improved? * What need to be developed? Improving the "WHY" will certainly enlighten the "HOW"... Have a nice day! Christophe Mark H. Wood a ?crit : > Some things to keep in mind: > > o There is no commercial or employment relationship between end-users > and developers here. Anybody who wants something done must > accomplish it either by his own (or his organization's) labor, or > by moral suasion -- appealing to the value of improving the > commons, or the good feeling that comes from having done something > well. > > The good news is that, *because* there is no mechanism of > compulsion, moral suasion works tolerably well in such situaions. > > o On the other hand, there is most definitely an employment > relationship between the developer and his own institution. If I > want to work on some aspect of DSpace, I have to convince my > supervisor that the work will benefit the institution sufficiently > to account for the cost of my time. It's difficult, but not > impossible, to sell intangible benefits like building up the > commons, but it is much easier to sell features that we need > ourselves. The result is that the needs of one's own institution > *usually* come first. > > o Just expounding a need of your institution may cause someone > elsewhere to realize, "hey, we could use that too -- and we have > the resources to build it." So we do all need to talk about our > needs and wishes, even if we can't realize them ourselves. > > o Code is not all there is. If your institution can't create code, > could it contribute documentation or user-interface design? Could > you volunteer to monitor a tracker and provide short monthly > postings on item turnover, or moderate a task force, or maintain a > most-popular-request list? > > o One of the most effective ways to poison a community project is to > try to manage contributors as if you have some authority over them. > They know better. Any community member (coder or not) who feels > that his contributions are unappreciated has *nothing to lose* by > ceasing to contribute, because the only reward for contribution is > already denied him. Because the project is held in common, he can > still work on it for those who *do* reward him. > > > And a few questions: > > On Thu, Aug 07, 2008 at 03:02:25PM +0200, Christophe Dupriez wrote: > >> 1) Apply the 80/20 rule: Create an immediatley applicable DSpace package >> which answers to 80% of the needs of 80% of the smaller institutions >> which would be happy to not hire any developer (and keep their money to >> hire a very good application manager) to have an enthusiastic result >> **that the DSpace foundation would guarantee to sustain on the long >> term, always providing an easy upgrade path from one version to the next** >> > > Has this not already been done? How do we know? What remains to be > done in order to satisfy the 80%? > > >> 2) The sponsors should be institutions gathering to provide resources >> (money, people, organisation skills) to obtain this result in a >> reasonable time frame (18 months). The Foundation would coordinate this >> committe, animate the process with a democratic "1 participating >> institution = 1 vote" decision process >> > > Doesn't this just entrench the plutocracy? Those lacking resources to > support development have no vote. Did I misunderstand? > > > And some suggested reading: > > _The Cathedral and the Bazaar_, by Eric S. Raymond. > > An exploration of the economics, psychology, and sociology of > development by community. If you want to know how to motivate > participants in a project like DSpace, or just understand why some > of them behave so oddly, this is a good place to start. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Dspace-general mailing list > Dspace-general at mit.edu > http://mailman.mit.edu/mailman/listinfo/dspace-general > -------------- next part -------------- A non-text attachment was scrubbed... Name: christophe_dupriez.vcf Type: text/x-vcard Size: 454 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080808/a2693bf2/attachment.vcf From sunilgoria at yahoo.com Mon Aug 11 02:29:38 2008 From: sunilgoria at yahoo.com (Sunil Goria) Date: Sun, 10 Aug 2008 23:29:38 -0700 (PDT) Subject: [Dspace-general] Big File upload problem Message-ID: <49434.44285.qm@web53402.mail.re2.yahoo.com> Dear all, Earlier I requested to solve my problem of file uplaoding. Now I found that I am able to upload file of less than 1 MB in Dspace server throug our LAN. When I try to upload big files grater than 1 MB it gives the error "Internet Explorer cannot display the webpage". Please suggest to uplaod big files of thesis etc in Dspace server. We are using Dspace 1.2 on Linux Enterprises version. with regards, Dr. Sunil Goria Assistant Librarian University Library, G.B. Pant University of Agriculture & Technology, Pantnagar-263145 (India) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080810/f449891d/attachment.htm From alitra at gmail.com Mon Aug 11 23:02:22 2008 From: alitra at gmail.com (Alice Tran) Date: Mon, 11 Aug 2008 17:02:22 -1000 Subject: [Dspace-general] PDF files in DSpace Message-ID: Hi, I've been trying to figure out a best practice for the PDF files we are ingesting into our Dspace system. I noticed back in February, Beth from Ohio, had asked a similar question and I was wondering if anybody else has since come up with another method or has a best practice to suggest. I'd be interested to know what other institutions are doing to prep their scanned PDFs before ingesting it into Dspace. Thanks! Alice Tran CMS/IR Specialist University of Hawaii at Manoa alicet at hawaii.edu http://library.manoa.hawaii.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080811/1e518b45/attachment.htm From mdorn at solinet.net Tue Aug 12 08:34:43 2008 From: mdorn at solinet.net (Givens, Marlee Dorn) Date: Tue, 12 Aug 2008 08:34:43 -0400 Subject: [Dspace-general] SOLINET live online class on Open Access and Repositories Message-ID: <3E4ED5DFB91E1743934243083EF3578B03F765D9@emailman.soli.net> Please excuse cross-posting. SOLINET is pleased to announce that there are still seats available for the following Live Online class: Open Access, Repositories, and More.. (Live Online) Instructor: Tyler Walters This class covers the elements of the open access movement, scholarly communications, and digital repositories. 9/4/08 10:00am-12:00pm Eastern Time For more information or to register: http://www.solinet.net/?sc_itemid={445791DD-8052-4296-AC69-0A1B0351A3E8} For our complete catalog, please visit www.solinet.net and click on Classes and Events. Thank you! MARLEE DORN GIVENS Manager, Preservation Services mdorn at solinet.net 404.892.0943 x3980 1438 West Peachtree Street NW Suite 200 Atlanta, GA 30309 Toll Free: 1.800.999.8558 Fax: 404.892.7879 www.solinet.net Please consider the environment before printing this e-mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080812/d8909f0e/attachment.htm From dsalo at library.wisc.edu Tue Aug 12 09:02:14 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Tue, 12 Aug 2008 08:02:14 -0500 Subject: [Dspace-general] DSpace development priorities: starting a discussion In-Reply-To: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com> References: <356cf3980808061307o47621d5cv7ac06f1d94b19b78@mail.gmail.com> Message-ID: <356cf3980808120602h308074dck7cda6cd7c8381564@mail.gmail.com> Greetings, DSpace community, Apologies to those for whom this message is duplicated; Mark Diggory asked me to bring dspace-tech into the loop. I've kept the original message to dspace-general quoted below, for those who haven't seen it. I want to thank everyone who has participated in this discussion already, both on- and off-list; I've gotten quite a bit of valuable feedback. Here's what I propose to do. If I don't hear objections or suggested refinements, I'll come up with days and times and we'll get started. First, I'd like to toss out a weekly question to the dspace-general group by way of gathering raw reactions and requirements that can later be distilled into something actionable. I have a supply of such questions, and I will be drawing more from the 2007 user survey, though of course I'm very open to suggestions from other community members. Responses can be on- or off-list; I will summarize off-list responses to the list. I would ask that responders to the weekly question answer immediately, BEFORE they read any other responses. This is important! Reading other answers tends to circumscribe one's own, reducing the overall breadth of response. In fact, I am tempted to say that the week of the weekly question should be reserved for immediate reaction, saving discussion for the next week... so we could have reactions to one question and discussion of another going on in different threads simultaneously. I would also like to do online chats or similar synchronous interaction at least biweekly. Timezones and language barriers are obviously a problem with that, but I'll do my best -- and I would appreciate hearing from potential chat hosts in Europe and Asia. I've set up a room on Meebo, faute de mieux; if there are better ideas, let me know. Finally, I've heard some interest in an informal birds-of-a-feather meeting on this topic at SPARC Digital Repositories 2008. I do expect to attend that conference, and I'm quite willing to facilitate a BOF. Let me know if this seems good -- have at it! And again, thank you. Dorothea On Wed, Aug 6, 2008 at 3:07 PM, Dorothea Salo wrote: > Greetings, DSpace community, > > For some time, I've been concerned that the DSpace development process > hasn't enjoyed as much input from the broader community as would be > desirable. The voices of less-technical repository managers and other > staff associated with DSpace repositories have been particularly > difficult to attract to the discussion. I'm hoping to gather > impressions and suggestions from this specific segment of the > community (though others are welcome as well!) to pass on to DSpace > developers. With any luck, this process will build a stronger > connection between developers and repository managers going forward. > > The DSpace development-priority survey done in 2007 was valuable and > worthwhile, and if possible, I'd like to revisit some of the questions > raised there. I'd also like to start "in your own words" discussions > about what repository managers want and need from DSpace that it isn't > yet providing. > > We can certainly talk here, and I welcome that! More than one DSpace > developer has agreed to monitor these discussions, and I will be > summarizing them back to the development list. But I'm completely open > to other venues as well -- IM, group chat, Web 2.0, Skype, out-of-band > email -- depending on what people tell me they want. > > So. How would you like to do this? Once we've sorted out the process, > we can get down to business. Feel free to contact me off-list if you > prefer. > > Dorothea > > -- > Dorothea Salo dsalo at library.wisc.edu > Digital Repository Librarian AIM: mindsatuw > University of Wisconsin > Rm 218, Memorial Library > (608) 262-5493 > -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From hellpop at umich.edu Tue Aug 12 15:02:08 2008 From: hellpop at umich.edu (Jim Ottaviani) Date: Tue, 12 Aug 2008 15:02:08 -0400 Subject: [Dspace-general] PDF files in DSpace (Alice Tran) In-Reply-To: Message-ID: > I've been trying to figure out a best practice for the PDF files we are > ingesting into our Dspace system. I noticed back in February, Beth from > Ohio, had asked a similar question and I was wondering if anybody else has > since come up with another method or has a best practice to suggest. I'd be > interested to know what other institutions are doing to prep their scanned > PDFs before ingesting it into Dspace. I'm not sure whether these are the sort of thing you had in mind, but we recently revised our PDF best practices, and the recommendations are available at http://hdl.handle.net/2027.42/58005 Jim ____________________________________ Jim Ottaviani +1 734-763-4835 Coordinator, Deep Blue http://deepblue.lib.umich.edu University of Michigan Library Quis custodiet ipsos custodes --Juvenal, Satires VI, 347 From Claudia.Juergen at ub.uni-dortmund.de Fri Aug 15 10:27:14 2008 From: Claudia.Juergen at ub.uni-dortmund.de (=?ISO-8859-1?Q?Claudia_J=FCrgen?=) Date: Fri, 15 Aug 2008 16:27:14 +0200 Subject: [Dspace-general] [Dspace-tech] Location Proposal for DSUG Mtg Fall 2009 In-Reply-To: <20080609083120.e7x1auvq9wogg8g0@www.dspace.org> References: <20080609083120.e7x1auvq9wogg8g0@www.dspace.org> Message-ID: <48A59242.9050205@ub.uni-dortmund.de> Hi Valorie, as Fall is getting closer, will there be a meeting or did this plan not develop any furhter. Sunny greetings Claudia J?rgen Valorie Hollister schrieb: > DSpace Community - > > As many of you are already aware, the next DSpace User Group Meeting > will be held in conjunction with next year's Open Repositories in May > 2009 in Atlanta, Georgia, USA. > > DSpace Foundation would like to help organize a stand-alone DSUG > meeting sometime between September - November 2009 in Europe. We've > already have a few informal offers to host the meeting, but before we > make a decision we would like to give the entire DSpace community a > chance to propose their location. > > Some of the key criteria for hosting the meeting include: > -location must be easily accessible for international participants > (i.e. close to an international airport) > -meeting facilities must accommodate at least 200 people for 2 days > -maximum charges per participant should not exceed $300 > -meeting facilities must be close to enough available, inexpensive > lodging for participants > > If you are interested in hosting the next DSUG meeting, please contact > me at val at dspace.org. > > Valorie Hollister > Community Outreach Manager > DSpace Foundation > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > DSpace-tech mailing list > DSpace-tech at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech From Claudia.Juergen at ub.uni-dortmund.de Fri Aug 15 10:31:28 2008 From: Claudia.Juergen at ub.uni-dortmund.de (=?ISO-8859-1?Q?Claudia_J=FCrgen?=) Date: Fri, 15 Aug 2008 16:31:28 +0200 Subject: [Dspace-general] [Dspace-tech] Location Proposal for DSUG Mtg Fall 2009 In-Reply-To: <48A59242.9050205@ub.uni-dortmund.de> References: <20080609083120.e7x1auvq9wogg8g0@www.dspace.org> <48A59242.9050205@ub.uni-dortmund.de> Message-ID: <48A59340.30107@ub.uni-dortmund.de> Hi All, sorry overlooked the 2009 and thought about 2008. Claudia Claudia J?rgen schrieb: > Hi Valorie, > > as Fall is getting closer, will there be a meeting or did this plan not > develop any furhter. > > Sunny greetings > > Claudia J?rgen > > > Valorie Hollister schrieb: >> DSpace Community - >> >> As many of you are already aware, the next DSpace User Group Meeting >> will be held in conjunction with next year's Open Repositories in May >> 2009 in Atlanta, Georgia, USA. >> >> DSpace Foundation would like to help organize a stand-alone DSUG >> meeting sometime between September - November 2009 in Europe. We've >> already have a few informal offers to host the meeting, but before we >> make a decision we would like to give the entire DSpace community a >> chance to propose their location. >> >> Some of the key criteria for hosting the meeting include: >> -location must be easily accessible for international participants >> (i.e. close to an international airport) >> -meeting facilities must accommodate at least 200 people for 2 days >> -maximum charges per participant should not exceed $300 >> -meeting facilities must be close to enough available, inexpensive >> lodging for participants >> >> If you are interested in hosting the next DSUG meeting, please contact >> me at val at dspace.org. >> >> Valorie Hollister >> Community Outreach Manager >> DSpace Foundation >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> DSpace-tech mailing list >> DSpace-tech at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/dspace-tech > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > DSpace-tech mailing list > DSpace-tech at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech From mcgeetho at shu.edu Fri Aug 15 12:53:41 2008 From: mcgeetho at shu.edu (Thomas A McGee) Date: Fri, 15 Aug 2008 12:53:41 -0400 Subject: [Dspace-general] Tom McGee is out of the office. Message-ID: I will be out of the office starting 08/15/2008 and will not return until 08/25/2008. I'm on vacation the week of August 18. I will respond to your message when I return on Monday the 25th. From mdiggory at MIT.EDU Fri Aug 15 13:27:19 2008 From: mdiggory at MIT.EDU (Mark Diggory) Date: Fri, 15 Aug 2008 10:27:19 -0700 Subject: [Dspace-general] DSpace 1.5.1-beta Release Message-ID: <04657E1C-77E7-4174-9C6B-BD0B23AE4145@mit.edu> Dear DSpace Community, We are pleased to announce the release of DSpace 1.5.1 beta. This beta is primarily a bug fix release incorporating numerous bug fixes and enhancements. Refer to the http://wiki.dspace.org/CurrentReleaseToDo and SVN history for details on these modifications. http://fisheye3.atlassian.com/changelog/dspace/branches/dspace-1_5_x? todate=1218776345558 The final release of 1.5.1 should be out before the end of August. We request that community members interested in testing the beta release please download it and verify that they can complete upgrade and fresh installation. We request that the svn branch be frozen until we do complete the final release, if developers do have further fixes, please request their addition through the developers list before moving forward with SVN commits. The documentation for this release is bundled within the package. DSpace 1.5.1 beta can be downloaded from the files area at http://sourceforge.net/project/showfiles.php? group_id=19984&package_id=143548&release_id=619910 or with SVN from http://dspace.svn.sf.net/svnroot/dspace/tags/dspace-1_5_1-beta/ Please use the mailing lists to provide feedback on this release. Those wishing to do development work with DSpace are strongly encouraged to obtain the source code using SVN. This is very straightforward and a guide to doing this is available here: http:// wiki.dspace.org/ContributionGuidelines We would also like to take this opportunity to invite you all to take part in the DSpace development process. Extra developer hands are always welcome, but there are other ways you can help: - Test the system and report bugs - Provide documentation (for end users and institutions, as well as technical) - Provide or update language packs - Share your deployment experiences - Donate content and metadata for testing and research - Share your technical experience and ideas Please visit the DSpace Wiki to see the various resources and collaboration tools available to the DSpace community: http:// wiki.dspace.org/DspaceResources Sincerely, Mark Diggory ~~~~~~~~~~~~~ Mark R. Diggory - DSpace Developer and Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology Home Page: http://purl.org/net/mdiggory/homepage From dsalo at library.wisc.edu Mon Aug 18 09:24:41 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Mon, 18 Aug 2008 08:24:41 -0500 Subject: [Dspace-general] Question one: What's working and what isn't? Message-ID: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com> Greetings, DSpace community, I've heard enough encouragement to keep on with my plans for an informal, qualitative information-gathering on DSpace development. So let's get started! This week's question (based on Q21 from the 2006 survey) is about DSpace's existing functionality. Please offer one to three existing DSpace features that you believe work well in your situation, then offer one to three existing features that you believe need improvement. Feel free to explain your answers at length! Also, please let us know which version of DSpace you are running. Housekeeping: - Please respond to the dspace-general list, or to me directly. DSpace-tech has a 1.5.1 beta to talk about, and I don't want to derail that very important conversation! - Please respond before reading or answering other responses! - I will summarize off-list responses to dspace-general no later than Friday. I have set up a Meebo Room at for live-chat discussion of the weekly topic. I am currently planning to run an hourlong chat at 9 am CT Wednesday (10 am ET, 3 pm GMT). You do not need to sign up with Meebo to participate. You do, however, need the room password, which is "dspace" (no quotes) -- this isn't for security, just an anti-random-troll measure. Thanks in advance to all participants. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From mdorn at solinet.net Mon Aug 18 09:57:24 2008 From: mdorn at solinet.net (Givens, Marlee Dorn) Date: Mon, 18 Aug 2008 09:57:24 -0400 Subject: [Dspace-general] SOLINET live online class Introduction to Institutional Repositories Message-ID: <3E4ED5DFB91E1743934243083EF3578B03F7661D@emailman.soli.net> SOLINET is pleased to announce that there are still seats available for the following live online class: Introduction to Institutional Repositories (Live Online) This session will define and describe the characteristics and features of institutional repositories, which can include not only scholarship of faculty and students but also digital assets such as administrative records, course notes, technical reports and learning objects. September 16 10:00am-12:00pm Eastern Time Instructor: David Greenebaum Price: $120 SOLINET members/$170 non-members For more information or to register, please visit: http://www.solinet.net/?sc_itemid={948AD1FB-6E39-45EC-B462-142B59FED689} Visit our Web site at www.solinet.net Thank you! MARLEE DORN GIVENS Manager, Preservation Services mdorn at solinet.net 404.892.0943 x3980 1438 West Peachtree Street NW Suite 200 Atlanta, GA 30309 Toll Free: 1.800.999.8558 Fax: 404.892.7879 www.solinet.net Please consider the environment before printing this e-mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080818/73509e07/attachment.htm From dsalo at library.wisc.edu Mon Aug 18 12:56:07 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Mon, 18 Aug 2008 11:56:07 -0500 Subject: [Dspace-general] Question one: What's working and what isn't? In-Reply-To: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com> References: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com> Message-ID: <356cf3980808180956m2b55ff9dkd601ed3c83b7498c@mail.gmail.com> Answering my own question... We're currently running 1.4.1 in production, and are testing and modding 1.5 for a rollout soon. > This week's question (based on Q21 from the 2006 survey) is about > DSpace's existing functionality. Please offer one to three existing > DSpace features that you believe work well in your situation, I'm very happy with Manakin theming (barring a few minor growls). As a de-facto consortial repository, being able to theme communities and have the theme cascade to subcommunities and collections is a major win. In the "little things make a big difference" category, the checksum checker makes me happy. Accidentally losing or mangling data is a nightmare; it would completely demolish the trust my user communities have in the service, and my own trust in the software. That nightly "all is well" email is a relief. The HTML display engine pretty much Just Works. Some of the best and most important work I've captured in both of the repositories I've run have been websites. I'm very grateful for this feature. then > offer one to three existing features that you believe need > improvement. Feel free to explain your answers at length! Also, please > let us know which version of DSpace you are running. Repeating quietly to myself "no new features... no new features..." The whole communities/collections model needs a rethink, I think. Faculty I talk to find it confusing and unintuitive; they expect communities to be able to contain items, and collections to be able to contain other collections. (The latter is particularly important for some kinds of scoped searching.) Perhaps following on from this, they expect to be able to make changes to community information that only an administrator can make, because there is no DSpace analogue to "collection administrator" for communities. Finally, for our consortial-repository purposes it's not good that only an administrator can change collection/item access policies. I need to be able to hand that work out to librarians at our member campuses, but DSpace won't let me. I understand that DSpace is meant to be an archival system, but the model of "metadata and bitstreams can change before final deposit, but not afterwards except by administrator fiat" doesn't accord with user expectations where I am. People make metadata mistakes and don't notice them until after approving the submission. People upload bitstreams and want to swap them out for better bitstreams. Stuff comes in through a variety of channels that needs editing after the fact (authority control, anyone?). I spend a *lot* of time -- much too much time! -- dealing with things like this, as well as talking down irritated users who want to be able to fix these things without going through me. I also end up editing metadata directly in the database (yes, I know, bad bad me!) because one SQL query takes so much less time than making the same change to forty-'leven items individually in the UI. The input-forms.xml system of modifying forms needs an overhaul as well. One problem with it is that not all repo managers have server access in order to modify this file, but they're a lot closer to the content/metadata than the IT professionals who *do* have access to the file. Another problem is some really bad interactions with the hardcoded "big three" front-page questions -- if you put date.issued in your input-forms.xml, but your depositor doesn't check the "previously published" box, DSpace wags a stern finger and won't let them proceed! (This is a serious problem for theses and dissertations, which do have a date.issued but aren't colloquially considered previously-published in many disciplines.) Finally, this file doesn't have any conditional logic. It can't, for example, say "okay, if dc.type is Working Paper, show these fields; otherwise, show those." This makes it essentially impossible to simplify the forms in a heterogeneous collection, which is an unhappy thing for usability. Right, those are my three. Next? Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From mveve at utk.edu Tue Aug 19 09:21:54 2008 From: mveve at utk.edu (Veve, Marielle) Date: Tue, 19 Aug 2008 09:21:54 -0400 Subject: [Dspace-general] Survey: Catalogers working with Non-MARC Metadata Message-ID: From: Veve, Marielle Sent: Thursday, August 07, 2008 2:10 PM To: Veve, Marielle Subject: Survey: Catalogers working with Non-MARC Metadata To *all catalogers* (with or without MLS) in academic libraries: [Please excuse the cross-posting] SURVEY: Integrating Non-MARC Metadata Production into the Duties of Traditional Catalogers You are invited to participate in a brief national, online survey. The objective of this survey is to research the national trends in the integration of Non-MARC metadata work into the duties of traditional catalogers and the perceptions and attitudes catalogers hold towards non-MARC metadata. For this study we would like to invite all catalogers in academic libraries, with or without MLS, who are involved in any aspect of non-MARC metadata work. I am asking you to please participate by answering this multiple choice survey. Your answers will be completely anonymous and confidential and will only be used to summarize information. *No* names or institution affiliation will be asked. Responding to the survey constitutes informed consent to participate in the research. The survey is voluntary, and you may withdraw from it at any time. It should take approximately 10 minutes to answer the 28 multiple choice questions of the survey. To complete the survey, follow this link http://www.surveymonkey.com/s.aspx?sm=b2XVTS5Z_2f5GV_2fXKUWTfyKw_3d_3d. The deadline to complete the survey is Sept.1, 2008. If you have questions at any time about the study or the procedures, you may contact the principal researcher, Marielle Veve; at Hodges Library, 1015 Volunteer Blvd., Knoxville, TN 37996; mveve at utk.edu. If you have questions about your rights as a participant, contact the Compliance Section at (423) 974-3466. Thank you in advance for assisting in this research project by taking the time to respond to the survey. This research project has been approved by the University of Tennessee's Institutional Review Board. -------- Marielle Veve Cataloging & Metadata Librarian Assistant Professor Hodges Library-University of Tennessee Knoxville, TN 37996 Phone: (865) 974-0394 E-mail: mveve at utk.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/559d3a4e/attachment.htm From robin.taylor at ed.ac.uk Tue Aug 19 09:50:11 2008 From: robin.taylor at ed.ac.uk (Robin Taylor) Date: Tue, 19 Aug 2008 14:50:11 +0100 Subject: [Dspace-general] Question one: What's working and what isn't? In-Reply-To: <356cf3980808180956m2b55ff9dkd601ed3c83b7498c@mail.gmail.com> Message-ID: <200808191350.m7JDoBwG010829@lmtp1.ucs.ed.ac.uk> Hi Dorothea, Thinking out loud about input-forms.xml:- In order to provide different metadata screens for theses we include the word theses in the collection names. We look for presence of 'theses' in the collection name before deciding which input-form to use. In effect we are using the collection name as a proxy for the type of item. Really it would be better for us to ask the submitter what type of item they are submitting and use an input-form based on item type rather than collection name. I am interested to know how people are currently making use of input-forms.xml. Cheers, Robin. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -----Original Message----- From: dspace-general-bounces at mit.edu [mailto:dspace-general-bounces at mit.edu] On Behalf Of Dorothea Salo Sent: 18 August 2008 17:56 To: dspace Subject: Re: [Dspace-general] Question one: What's working and what isn't? Answering my own question... We're currently running 1.4.1 in production, and are testing and modding 1.5 for a rollout soon. > This week's question (based on Q21 from the 2006 survey) is about > DSpace's existing functionality. Please offer one to three existing > DSpace features that you believe work well in your situation, I'm very happy with Manakin theming (barring a few minor growls). As a de-facto consortial repository, being able to theme communities and have the theme cascade to subcommunities and collections is a major win. In the "little things make a big difference" category, the checksum checker makes me happy. Accidentally losing or mangling data is a nightmare; it would completely demolish the trust my user communities have in the service, and my own trust in the software. That nightly "all is well" email is a relief. The HTML display engine pretty much Just Works. Some of the best and most important work I've captured in both of the repositories I've run have been websites. I'm very grateful for this feature. then > offer one to three existing features that you believe need > improvement. Feel free to explain your answers at length! Also, please > let us know which version of DSpace you are running. Repeating quietly to myself "no new features... no new features..." The whole communities/collections model needs a rethink, I think. Faculty I talk to find it confusing and unintuitive; they expect communities to be able to contain items, and collections to be able to contain other collections. (The latter is particularly important for some kinds of scoped searching.) Perhaps following on from this, they expect to be able to make changes to community information that only an administrator can make, because there is no DSpace analogue to "collection administrator" for communities. Finally, for our consortial-repository purposes it's not good that only an administrator can change collection/item access policies. I need to be able to hand that work out to librarians at our member campuses, but DSpace won't let me. I understand that DSpace is meant to be an archival system, but the model of "metadata and bitstreams can change before final deposit, but not afterwards except by administrator fiat" doesn't accord with user expectations where I am. People make metadata mistakes and don't notice them until after approving the submission. People upload bitstreams and want to swap them out for better bitstreams. Stuff comes in through a variety of channels that needs editing after the fact (authority control, anyone?). I spend a *lot* of time -- much too much time! -- dealing with things like this, as well as talking down irritated users who want to be able to fix these things without going through me. I also end up editing metadata directly in the database (yes, I know, bad bad me!) because one SQL query takes so much less time than making the same change to forty-'leven items individually in the UI. The input-forms.xml system of modifying forms needs an overhaul as well. One problem with it is that not all repo managers have server access in order to modify this file, but they're a lot closer to the content/metadata than the IT professionals who *do* have access to the file. Another problem is some really bad interactions with the hardcoded "big three" front-page questions -- if you put date.issued in your input-forms.xml, but your depositor doesn't check the "previously published" box, DSpace wags a stern finger and won't let them proceed! (This is a serious problem for theses and dissertations, which do have a date.issued but aren't colloquially considered previously-published in many disciplines.) Finally, this file doesn't have any conditional logic. It can't, for example, say "okay, if dc.type is Working Paper, show these fields; otherwise, show those." This makes it essentially impossible to simplify the forms in a heterogeneous collection, which is an unhappy thing for usability. Right, those are my three. Next? Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 _______________________________________________ Dspace-general mailing list Dspace-general at mit.edu http://mailman.mit.edu/mailman/listinfo/dspace-general From sdl at aber.ac.uk Tue Aug 19 11:16:42 2008 From: sdl at aber.ac.uk (Stuart Lewis [sdl]) Date: Tue, 19 Aug 2008 16:16:42 +0100 Subject: [Dspace-general] The new RSP blog directory In-Reply-To: Message-ID: [apologies for cross-posting] Directory of repository related blogs (http://rsp.ac.uk/blogs/) --------------------------------------------------------------- The JISC funded Repositories Support Project has today launched a new service - The RSP Blog Directory (http://rsp.ac.uk/blogs/). It provides a list of recommended and informative blogs regarding the repository scene from around the globe. Listed blogs include personal creations from those with first hand experience of repository management and/or technical development of repository software; blogs for specific repositories, projects and software developers; as well as blogs for groups and societies with an interest in the open access movement and digital curation. Each entry in the directory has a brief description of what the blog contains, with links to view either the entire blog or just the RSS feed. Blogs have been arranged into categories by type, and you are able to download an OPML file to view the RSS feeds within your blog reader of choice for a selected category, or for all the blogs listed in the directory. We hope the directory is pretty comprehensive but if you think there are any blogs missing from this list, please e-mail your suggestion to the RSP team at support at rsp.ac.uk. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/2a032227/attachment.htm From j.proven at abertay.ac.uk Tue Aug 19 11:37:32 2008 From: j.proven at abertay.ac.uk (Proven, Jackie) Date: Tue, 19 Aug 2008 16:37:32 +0100 Subject: [Dspace-general] Skip file upload option Message-ID: We are newcomers to DSpace and have just installed v1.5. I believe it is now possible by default to disable the hard requirement to include a full-text in the submission (so you can skip the file upload step). Can anyone tell us how to implement this as we have been unable to find details in any documentation. Many thanks Jackie -- Jackie Proven Senior Information Officer Information Services, University of Abertay Dundee Tel: 01382 308867 E-mail: j.proven at abertay.ac.uk The University of Abertay Dundee is a charity registered in Scotland, No: SC016040 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/21a3ad53/attachment.htm From Claudia.Juergen at ub.uni-dortmund.de Tue Aug 19 12:35:49 2008 From: Claudia.Juergen at ub.uni-dortmund.de (Claudia Juergen) Date: Tue, 19 Aug 2008 18:35:49 +0200 (CEST) Subject: [Dspace-general] Skip file upload option In-Reply-To: References: Message-ID: Hi Jackie, you must set webui.submit.upload.required = false in your dspace.cfg. Per default it is set to true. Claudia > We are newcomers to DSpace and have just installed v1.5. I believe it is > now possible by default to disable the hard requirement to include a > full-text in the submission (so you can skip the file upload step). Can > anyone tell us how to implement this as we have been unable to find > details in any documentation. > > Many thanks > Jackie > > > -- > Jackie Proven > Senior Information Officer > Information Services, University of Abertay Dundee > Tel: 01382 308867 > E-mail: j.proven at abertay.ac.uk > > The University of Abertay Dundee is a charity registered in Scotland, > No: SC016040 > > > _______________________________________________ > Dspace-general mailing list > Dspace-general at mit.edu > http://mailman.mit.edu/mailman/listinfo/dspace-general > From sshreeve at illinois.edu Tue Aug 19 12:57:36 2008 From: sshreeve at illinois.edu (Sarah L. Shreeves) Date: Tue, 19 Aug 2008 11:57:36 -0500 Subject: [Dspace-general] Question one: What's working and what isn't? Message-ID: <48AAFB80.6040302@illinois.edu> We're running DSpace 1.4.2 (heavily customized) and are in the process of upgrading to 1.5 - we expect to do that later this fall. Things that work well: - I appreciate the metadata template for collections. We do a fair number of serial type things that have very common metadata, so it's useful to be able to have those pre-filled out fields. I'd love to be able to reuse these templates from one collection to another but that would be a bonus. - I agree with Dorothea that the checksum checker and the html display engine work well and make me and (in the case of the html display engine) my end users happy. - I am pretty happy with how Manakin alongside the customizable submission process will simplify the upload processes for our users and will allow us to tailor some communities for specific user groups. Areas that need further development: - I'd agree with Dorothea that the community / collection structure and administration could be re-thought. We've actually customized our instance and have added community administration functionality which has turned out to be so crucial for us. It's allowed me to get departmental libraries and colleges involved in IDEALS at a level that I don't think would have been possible otherwise. For example, our Agriculture library completely takes care of the Agriculture community - adds sub-communities and collections as needed, adds additional administrators, etc - which has been a great way to distribute the work of running IDEALS. We're in talks with our Grad College now about ETD's which will absolutely require community level administration. This is a very important development area. - Statistics and Reports - I know that there are a couple of stats packages out there (and we're using one currently) but none seem to be very satisfactory - or haven't been upgraded to work with Manakin. This is one of the primary selling points of IDEALS for many - and what we have in place is pretty basic. We really have to get stats and reports integrated into the core DSpace code. - Better ways for both repo managers and collection administrators to edit metadata both individually and in bulk. I don't have direct access to the database - and honestly wouldn't know how to change things there if I did (I'd have to reach back pretty far in my memory) - so I either have to ask Tim D. to do updates or I have to do things item by item. The same is true of my collection administrators. The item by item update process for metadata is painful (I never send my collection administrators there if I can avoid it) - this could certainly be improved. I also think that a bulk update would be an extremely useful development - from the user interface! These are brief thoughts but I wanted to get them out there. Sarah ------------------------------------------ Sarah L. Shreeves Coordinator, IDEALS http://www.ideals.uiuc.edu/ University of Illinois at Urbana-Champaign sshreeve at illinois.edu 217-333-4648 or 217-244-3877 From dsalo at library.wisc.edu Wed Aug 20 08:59:04 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Wed, 20 Aug 2008 07:59:04 -0500 Subject: [Dspace-general] Chat in one hour Message-ID: <356cf3980808200559s22ce2880q637e9a7201862cdc@mail.gmail.com> Greetings, DSpace community, Just a quick reminder that repository managers, support staff, and developers are welcome to meet each other and chat informally about the software in one hour (10 Eastern, 9 Central, 3 GMT) in the DSpaceDevelopment Meebo room at . The room password is "dspace" (no quotes). I'm there already if anyone cares to turn up early. If you have trouble getting in, you can contact me by email at this address, or via AIM at "mindsatuw". Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From mwood at IUPUI.Edu Wed Aug 20 09:10:02 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Wed, 20 Aug 2008 09:10:02 -0400 Subject: [Dspace-general] Community admin.s; statistics In-Reply-To: <48AAFB80.6040302@illinois.edu> References: <48AAFB80.6040302@illinois.edu> Message-ID: <20080820131002.GB9118@IUPUI.Edu> On Tue, Aug 19, 2008 at 11:57:36AM -0500, Sarah L. Shreeves wrote: > - I'd agree with Dorothea that the community / collection structure and > administration could be re-thought. We've actually customized our > instance and have added community administration functionality which > has turned out to be so crucial for us. It's allowed me to get > departmental libraries and colleges involved in IDEALS at a level that I > don't think would have been possible otherwise. If you haven't yet prepared a patch to share -- I think this would be widely appreciated. > - Statistics and Reports - I know that there are a couple of stats > packages out there (and we're using one currently) but none seem to be > very satisfactory - or haven't been upgraded to work with Manakin. I think that as we move forward on that problem, we need to work out the various meanings of "statistics". Different consumers (organizational admin.s, system admin.s, community/collection admin.s, contributors, users) want to know different classes of things or want them presented in different ways. For example, are you looking for overall reports, or per-object statistics distributed throughout the user interface(s)? There's a lot of thought-work yet to be done, and different sites will want to use different approaches. [shameless plug] That's why, on the edges of this challenge, I've been working to get code like patch 2025998* to a state fit for inclusion, to make it easier for lots of people to plug into common object instrumentation points and try out their ideas concerning the best way to store, aggregate and present statistics. Anyway I think that the failure of "statistics" to gain traction is in part due to collective confusion over what it should mean to "do statistics in DSpace". Once we understand the consumer communities and their separate needs, I think consensus will be more likely. ------------------------ * http://sourceforge.net/tracker/index.php?func=detail&aid=2025998&group_id=19984&atid=319984 -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080820/7c09cc53/attachment.bin From sshreeve at illinois.edu Wed Aug 20 09:33:35 2008 From: sshreeve at illinois.edu (Sarah L. Shreeves) Date: Wed, 20 Aug 2008 08:33:35 -0500 Subject: [Dspace-general] Community admin.s; statistics In-Reply-To: <20080820131002.GB9118@IUPUI.Edu> References: <48AAFB80.6040302@illinois.edu> <20080820131002.GB9118@IUPUI.Edu> Message-ID: <48AC1D2F.70509@illinois.edu> Yes, I definitely agree that we need to define what this means. Sarah Mark H. Wood wrote: > - Statistics and Reports - I know that there are a couple of stats >> packages out there (and we're using one currently) but none seem to be >> very satisfactory - or haven't been upgraded to work with Manakin. >> > > I think that as we move forward on that problem, we need to work out > the various meanings of "statistics". Different consumers > (organizational admin.s, system admin.s, community/collection admin.s, > contributors, users) want to know different classes of things or want > them presented in different ways. For example, are you looking for > overall reports, or per-object statistics distributed throughout the > user interface(s)? There's a lot of thought-work yet to be done, and > different sites will want to use different approaches. > > [shameless plug] That's why, on the edges of this challenge, I've been > working to get code like patch 2025998* to a state fit for inclusion, > to make it easier for lots of people to plug into common object > instrumentation points and try out their ideas concerning the best way > to store, aggregate and present statistics. > > Anyway I think that the failure of "statistics" to gain traction is in > part due to collective confusion over what it should mean to "do > statistics in DSpace". Once we understand the consumer communities > and their separate needs, I think consensus will be more likely. > > ------------------------ > * http://sourceforge.net/tracker/index.php?func=detail&aid=2025998&group_id=19984&atid=319984 > > -- ------------------------------------------ Sarah L. Shreeves Coordinator, IDEALS http://www.ideals.uiuc.edu/ University of Illinois at Urbana-Champaign sshreeve at illinois.edu 217-333-4648 or 217-244-3877 From tdonohue at illinois.edu Wed Aug 20 10:12:24 2008 From: tdonohue at illinois.edu (Tim Donohue) Date: Wed, 20 Aug 2008 09:12:24 -0500 Subject: [Dspace-general] Community admin.s; statistics In-Reply-To: <20080820131002.GB9118@IUPUI.Edu> References: <48AAFB80.6040302@illinois.edu> <20080820131002.GB9118@IUPUI.Edu> Message-ID: <48AC2648.8040801@illinois.edu> Mark, Quick response to your comment about a Community Administration patch... Mark H. Wood wrote: > On Tue, Aug 19, 2008 at 11:57:36AM -0500, Sarah L. Shreeves wrote: >> - I'd agree with Dorothea that the community / collection structure and >> administration could be re-thought. We've actually customized our >> instance and have added community administration functionality which >> has turned out to be so crucial for us. It's allowed me to get >> departmental libraries and colleges involved in IDEALS at a level that I >> don't think would have been possible otherwise. > > If you haven't yet prepared a patch to share -- I think this would be > widely appreciated. The patch we are currently using at U of Illinois is one that has been available since just before DSpace 1.4. It was originally created by Andrea Bollini: http://sourceforge.net/tracker/index.php?func=detail&aid=1373613&group_id=19984&atid=319984 However, since that patch is now very *out of date*, I'm currently working on an updated version specifically for the DSpace 1.5 XMLUI. I'll be posting it as soon as it is stable/complete for others to use (hopefully by early-to-mid Sept). At this time, I'm not planning on implementing it for the DSpace 1.5 JSPUI, as that would be additional work and U of Illinois isn't planning to use the JSPUI any longer. I don't anticipate my patch making it into a DSpace out-of-the-box release, as I'm hoping that the DSpace 2.0 work will implement this functionality in a much more complete manner. Let me know if you have any more questions on this... - Tim -- Tim Donohue Research Programmer, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) University of Illinois at Urbana-Champaign tdonohue at illinois.edu | (217) 333-4648 From Christina.Richison at nitle.org Tue Aug 19 11:34:05 2008 From: Christina.Richison at nitle.org (Christina Richison) Date: Tue, 19 Aug 2008 11:34:05 -0400 Subject: [Dspace-general] What's working and what isn't? In-Reply-To: Message-ID: <08119B28F4B3FF46A5747921C9FAA678D26839@AA1EXCH06.office.share.org> DSpace Community, To address Dorothea's excellent question, I have the following to offer: What is working? 1. I like being able to add thumbnails through the jpeg media filter. A little something that makes life easier. 2. I like the fuzzy search feature. More information can be found here: http://lucene.apache.org/java/docs/queryparsersyntax.html#Fuzzy%20Search es 3. I like the DSpace hierarchy and the option of creating sub-communities within sub-communities. What isn't working? 1. Moving Communities, Collections, and Items: It would be nice to drag and drop these components into new homes instead of going through an export/import process. An example, it makes more sense to my department to move Collection B from Subcommunity A into Subcommunity B. I don't want to go through the export/import process with the appropriate XML file structure to accomplish this task. 2. Default Naming of Groups: It is easy to get "lost" when working with Authorization Groups. For example, I don't remember what collection 327 is. The following screen shots displays some groups for which I am an authorizing member. Now what exactly are they? The default should help clarify this not make me work to clarify. All the best, Christina Richison NITLE, NIS Technical Services Specialist christina.richison at nitle.org ----------------- Today's Topics: 1. Re: Question one: What's working and what isn't? (Dorothea Salo) 2. Survey: Catalogers working with Non-MARC Metadata (Veve, Marielle) 3. Re: Question one: What's working and what isn't? (Robin Taylor) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/cd39448e/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 20391 bytes Desc: image002.jpg Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080819/cd39448e/attachment.jpg From dsalo at library.wisc.edu Wed Aug 20 11:49:43 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Wed, 20 Aug 2008 10:49:43 -0500 Subject: [Dspace-general] Chat summary: 20 August 2008 Message-ID: <356cf3980808200849y5d206b71j3c607e763a489761@mail.gmail.com> We had about twenty people (and, unfortunately, two or three trolls) in this morning's chat! That's a much larger turnout than I expected, and I find it very encouraging. After a round of introductions, we talked about the following things: BULK METADATA EDITING Use cases included name and subject authority control, adding a new piece of metadata to all the items in a collection at once. One manager wanted to allow student assistants to bulk-edit metadata. Suggestion: export/import of a collection's metadata only, for batch editing PERMISSIONS One manager said that permissions were opaquely-named and difficult to understand, making it hard to determine exactly what permissions a given eperson has. Desiderata included letting epeople other than administrators create collections, automatically changing edit permissions on existing items in a collection when a new collection administrator is added, and letting collection administrators edit/change bitstreams (use case: ETDs with last-minute corrections). Suggestion: instead of recording permissions on each individual item, check against collection administrator list for edit rights on the item. DOCUMENTATION Several people mentioned using the wiki, especially the how-to pages. It was noted that the how-to pages are becoming disorganized and unwieldy, which will only get worse as more are added. Suggestions: organize the how-to pages by version of DSpace to which they apply; organize the how-to pages by task ("Install" "Customize" "Administer" "Troubleshoot" "Internationalize" etc). The mailing lists are helpful, but good information becomes the "needle in the haystack" -- hard to search for, especially with the unfriendly SourceForge interface. Several managers archive useful messages for later use. Suggestions: Build a way to auto-forward useful messages from dspace-tech to the wiki, for editing by one or more community members. Reuse material from an upcoming course on administering DSpace. Develop a "new user guide." Reorganize the DSpace feature list by common perceived needs rather than by feature. Dealing with problems in the underlying technology stack rather than DSpace itself can be difficult, as can finding live help. Suggestions: advertize the DSpace IRC room, arrange "office hours" there. OTHER DESIDERATA - embargoes (two managers reported using an embargo hack; one is delaying an upgrade to 1.5 because it does not have one) - multilingual issues: community/collection descriptions in more than one language, metadata input in more than one language LOGISTICS The IRC chatroom (irc.freenode.net, #dspace) is an underused resource! Developers and admins watch the room who are happy to help with DSpace issues. To broaden awareness of this helpful space, chats will be held there going forward. Next week's agenda should include discussion of DSpace statistics. Thanks to all participants! Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From Claudia.Juergen at ub.uni-dortmund.de Wed Aug 20 13:06:08 2008 From: Claudia.Juergen at ub.uni-dortmund.de (Claudia Juergen) Date: Wed, 20 Aug 2008 19:06:08 +0200 (CEST) Subject: [Dspace-general] What's working and what isn't? In-Reply-To: <08119B28F4B3FF46A5747921C9FAA678D26839@AA1EXCH06.office.share.org> References: <08119B28F4B3FF46A5747921C9FAA678D26839@AA1EXCH06.office.share.org> Message-ID: <32cc7398780c7664581bf0d0e48fe303.squirrel@mail.ub.uni-dortmund.de> Hi Christina, It is meanwhile possible to move at items via the DSpace UI. There is a patch for moving collections in the patch queue, but I haven't tried it yet. If you do, it would be great to have some feedback. As for authorizations, in standard use cases you do not need to know the default authorization group names i.e. while creating or editing a dspace object. The best practice with regards to authorizations is to use your own groups, even if they just consist of 0-1 member at the beginning, i.e. while setting up new structures using standard groups. Thus e.g. staff changes are little trouble. Sunny greetings Claudia > DSpace Community, > > > > To address Dorothea's excellent question, I have the following to offer: > > > > What is working? > > 1. I like being able to add thumbnails through the jpeg media > filter. A little something that makes life easier. > > 2. I like the fuzzy search feature. More information can be found > here: > http://lucene.apache.org/java/docs/queryparsersyntax.html#Fuzzy%20Search > es > > 3. I like the DSpace hierarchy and the option of creating > sub-communities within sub-communities. > > > > What isn't working? > > 1. Moving Communities, Collections, and Items: It would be nice to > drag and drop these components into new homes instead of going through > an export/import process. An example, it makes more sense to my > department to move Collection B from Subcommunity A into Subcommunity B. > I don't want to go through the export/import process with the > appropriate XML file structure to accomplish this task. > > 2. Default Naming of Groups: It is easy to get "lost" when working > with Authorization Groups. For example, I don't remember what collection > 327 is. The following screen shots displays some groups for which I am > an authorizing member. Now what exactly are they? The default should > help clarify this not make me work to clarify. > > > > > > All the best, > > > > Christina Richison > > NITLE, NIS Technical Services Specialist > > christina.richison at nitle.org > > > > ----------------- > > > > > > Today's Topics: > > > > 1. Re: Question one: What's working and what isn't? (Dorothea Salo) > > 2. Survey: Catalogers working with Non-MARC Metadata (Veve, Marielle) > > 3. Re: Question one: What's working and what isn't? (Robin Taylor) > > > > > > _______________________________________________ > Dspace-general mailing list > Dspace-general at mit.edu > http://mailman.mit.edu/mailman/listinfo/dspace-general > From mdiggory at MIT.EDU Wed Aug 20 15:11:12 2008 From: mdiggory at MIT.EDU (Mark Diggory) Date: Wed, 20 Aug 2008 12:11:12 -0700 Subject: [Dspace-general] [Dspace-tech] Question one: What's working and what isn't? In-Reply-To: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com> References: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com> Message-ID: <7DD5CC2A-7223-4715-91C4-0AD129AD9525@mit.edu> On Aug 18, 2008, at 6:24 AM, Dorothea Salo wrote: > Housekeeping: > > - Please respond to the dspace-general list, or to me directly. > DSpace-tech has a 1.5.1 beta to talk about, and I don't want to derail > that very important conversation! Release discussions generally occur on dspace-devel and dspace-commit lists (though infrequently). I would recommend not separating the IR Manager user group out of the user community. I've been generally dissatisfied with the breakup of the community over the lists of dspace-general at mit, dspace-tech at sf and dspace- devel at sf. I've recommended in the past, a consolidation of or restructuring of this list setup. By breaking off even more avenues for discussion, it creates an even great state of chaos and localized discussion that is difficult to keep track of. In the past I recommended moving dspace-general to the SF site to assure that its is clearly identified with the DSpace foundation and community rather than MIT Libraries. In the past I've also recommened renaming the lists to clarify the standard defacto OS listserv roles that they should be playing in the community dspace-general at mit --> consolidate into below dspace-tech at sf --> dspace-user at sf (or possibly dspace- community) dspace-devel at sf --> dspace-devel at sf dspace-commit at sf --> dspace-admin at sf I also recommend an additional read only list. dspace-announce at sf For which folks only interested in official dspace foundation/ community announcements and not other discussions happening above. I am concerned that the spawning off of new discussion/chat/email lists is undermining the communities ability to maintain a centralized and clearly transparent mechanism for communication. -Mark ~~~~~~~~~~~~~ Mark R. Diggory - DSpace Developer and Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology Home Page: http://purl.org/net/mdiggory/homepage From mdiggory at MIT.EDU Wed Aug 20 15:12:24 2008 From: mdiggory at MIT.EDU (Mark Diggory) Date: Wed, 20 Aug 2008 12:12:24 -0700 Subject: [Dspace-general] [Dspace-tech] Chat summary: 20 August 2008 In-Reply-To: <356cf3980808200849y5d206b71j3c607e763a489761@mail.gmail.com> References: <356cf3980808200849y5d206b71j3c607e763a489761@mail.gmail.com> Message-ID: <3A5F663C-5468-44BD-8DC3-EBB3C8C3D476@mit.edu> The timing of this meeting was a bit off in my time zone, 0700 am (and is why I usefully prefer to use email for communication within the community as it is asynchronous). Is there a transparent log of the chat conversation, I've logged into Meebo, but only found a portion of the history there... Its pleasant to see a round table get together and talk about such issues, I hope it will fuel an activity to get a better needs assessment out of the IR Manager user group. I would highly recommend formalizing the history of the event by summarizing this in the WIKI (as you point out a need to do when such events occur in chats and email lists). I also recommend someone should act as a moderator/secretary right now to aggregate your further list discussion into a more formal state in the WIKI in as real a time as possible. It would be best to have the notes and chat log that you've presented here into a section of the WIKI focused wholly on the interests of the IR Managers group that is forming here. This could even simply be links to the pertinent threads of interest in the S.F. and dspace-general email lists. Cheers, Mark On Aug 20, 2008, at 8:49 AM, Dorothea Salo wrote: > We had about twenty people (and, unfortunately, two or three trolls) > in this morning's chat! That's a much larger turnout than I expected, > and I find it very encouraging. > > After a round of introductions, we talked about the following things: > > BULK METADATA EDITING > Use cases included name and subject authority control, adding a new > piece of metadata to all the items in a collection at once. > One manager wanted to allow student assistants to bulk-edit metadata. > Suggestion: export/import of a collection's metadata only, for > batch editing > > PERMISSIONS > One manager said that permissions were opaquely-named and difficult to > understand, making it hard to determine exactly what permissions a > given eperson has. > Desiderata included letting epeople other than administrators create > collections, automatically changing edit permissions on existing items > in a collection when a new collection administrator is added, and > letting collection administrators edit/change bitstreams (use case: > ETDs with last-minute corrections). > Suggestion: instead of recording permissions on each individual item, > check against collection administrator list for edit rights on the > item. > > DOCUMENTATION > Several people mentioned using the wiki, especially the how-to pages. > It was noted that the how-to pages are becoming disorganized and > unwieldy, which will only get worse as more are added. > Suggestions: organize the how-to pages by version of DSpace to which > they apply; organize the how-to pages by task ("Install" "Customize" > "Administer" "Troubleshoot" "Internationalize" etc). > The mailing lists are helpful, but good information becomes the > "needle in the haystack" -- hard to search for, especially with the > unfriendly SourceForge interface. Several managers archive useful > messages for later use. > Suggestions: Build a way to auto-forward useful messages from > dspace-tech to the wiki, for editing by one or more community members. > Reuse material from an upcoming course on administering DSpace. > Develop a "new user guide." Reorganize the DSpace feature list by > common perceived needs rather than by feature. > Dealing with problems in the underlying technology stack rather than > DSpace itself can be difficult, as can finding live help. > Suggestions: advertize the DSpace IRC room, arrange "office hours" > there. > > OTHER DESIDERATA > - embargoes (two managers reported using an embargo hack; one is > delaying an upgrade to 1.5 because it does not have one) > - multilingual issues: community/collection descriptions in more than > one language, metadata input in more than one language > > LOGISTICS > The IRC chatroom (irc.freenode.net, #dspace) is an underused resource! > Developers and admins watch the room who are happy to help with DSpace > issues. To broaden awareness of this helpful space, chats will be held > there going forward. Next week's agenda should include discussion of > DSpace statistics. > > Thanks to all participants! > > Dorothea > > -- > Dorothea Salo dsalo at library.wisc.edu > Digital Repository Librarian AIM: mindsatuw > University of Wisconsin > Rm 218, Memorial Library > (608) 262-5493 > > ---------------------------------------------------------------------- > --- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > DSpace-tech mailing list > DSpace-tech at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech From mwood at IUPUI.Edu Wed Aug 20 15:59:12 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Wed, 20 Aug 2008 15:59:12 -0400 Subject: [Dspace-general] [Dspace-tech] Question one: What's working and what isn't? In-Reply-To: <7DD5CC2A-7223-4715-91C4-0AD129AD9525@mit.edu> References: <356cf3980808180624mbf0e7cax3b7006532703a4f2@mail.gmail.com> <7DD5CC2A-7223-4715-91C4-0AD129AD9525@mit.edu> Message-ID: <20080820195912.GB24603@IUPUI.Edu> On Wed, Aug 20, 2008 at 12:11:12PM -0700, Mark Diggory wrote: > In the past I recommended moving dspace-general to the SF site to > assure that its is clearly identified with the DSpace foundation and > community rather than MIT Libraries. A data point: I had forgotten that dspace-general even existed, since it wasn't visible at SF, until it was recently mentioned on dspace-devel. Housing all of the lists together sounds good to me. (OTOH the SF list archive navigation tools are awful!) > dspace-commit at sf --> dspace-admin at sf Um, dspace-admin sounds like "for discussion of administration of DSpace installations". That's certainly not what a commit list is for. ??? > I am concerned that the spawning off of new discussion/chat/email > lists is undermining the communities ability to maintain a > centralized and clearly transparent mechanism for communication. A particular problem with chats is that there is no record of any progress made there unless someone is logging. May I suggest that, in any chat, when consensus or other significant progress is reached, there be a call for a volunteer to write up a summary thereof and post it to a mailing list or the wiki, so that it can be referred back to later or discovered by those not present in the chat. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080820/9fc05be4/attachment.bin From dsalo at library.wisc.edu Fri Aug 22 10:34:29 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Fri, 22 Aug 2008 09:34:29 -0500 Subject: [Dspace-general] Summary: Week 1 responses Message-ID: <356cf3980808220734q63ad1fa6yeab61c1f463ec6f5@mail.gmail.com> (The chat summary is up on the wiki at ; participants, feel free to edit! This summary will go up on the wiki also, and will likewise be editable.) I received six off-list responses, alongside three onlist ones (including my own). LIKES - Simple to get running, a lot of bang for the buck (3 mentions) - Checksum checker (3 mentions) - HTML display engine (3 mentions) - Search engine: "simple and fast" (2 mentions) - Manakin theming (2 mentions) - Legible displays, easy structuring of data and depositors into communities/collections - Flexible Dublin Core, easy defining and rebuilding of indexes - Storage in SQL database - Easy to showcase and share data - Community spirit and visionary dedication! - Configurable submission workflow, metadata templating - Maven dealing with Java dependencies NEEDS/ISSUES - Complexity of authorization/permissions system, poor fit with real-world workflows, too much work for DSpace admins that can't be delegated (5 mentions) - Communities/collections model confusing and unintuitive for end-users (3 mentions) - Submission process needs streamlining and simplification; input-forms.xml needs to be end-user editable (3 mentions) - Allow bitstream updating/addition after deposit by users and collection administrators (me, George) - Doesn't automatically feed people into user groups based on LDAP group membership - Allow depositors to withdraw their own items - Difficult to customize; also, moving to Manakin costs functionality - Documentation scattered and confusing - Can only use Postgres and Oracle databases - Too-close integration with handles DESIRED NEW FUNCTIONALITY I tried to exclude this from the question, but I got a lot of it anyway! - better i18n (community/collection descriptions, metadata-input forms) (3 mentions) - statistics (per item, per author) (3 mentions) - batch import of citation-only references from a single document - web UI to prompt a reindexing - add "also by" (author) or "see also" (subject) links to item pages - persistent bitstream handles/URLs - HTML in metadata (e.g. abstracts) - embargo support - bulk metadata editing through a web UI The floor is open for discussion! Devs, please feel free to ask for clarification, which I hope participants will provide. Participants, if I have traduced your input, please do say so. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From dsalo at library.wisc.edu Mon Aug 25 09:08:47 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Mon, 25 Aug 2008 08:08:47 -0500 Subject: [Dspace-general] Week 2: Statistics Message-ID: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> Greetings, DSpace community, I want to thank everyone once again for last week's stimulating discussion and impressive chat turnout! I have a new question for everyone this week, pursuant to some discussion on the lists: "Statistics" are one of the commonest requests for a new DSpace feature. Without further specification, however, it's hard to know what data to present, since there are no standards or even clear best practices in this area. What statistics do the following groups of DSpace users need to see, and in what form are the statistics best presented to them? Depositors End-users (defined as "people examining items and downloading bitstreams from a DSpace instance;" we may have to refine this further in discussion) DSpace repository managers (as distinct from systems administrators) What else should developers keep in mind as they implement this feature? Because it would be nice to reach a working consensus on this (unlike last week's question, which was intended to pull out as broad a selection of needs as possible), I think we should start discussing immediately. I encourage all respondents to respond TO THE MAILING LIST instead of to me. I will be holding another chat to discuss the weekly question. It will take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on irc.freenode.net. I apologize to West Coast (USA) community members for last week's unconscionably early hour; we'll try 10 am US Central (11 am Eastern, 4 pm GMT) this week, and we may go even later next week if our European community members can stand it. For those who don't normally use IRC, there are two easy web gateways. One is mibbit.com; the other is specific to our channel and can be found at . I encourage all of us to become familiar with the channel; it is a source of real-time technical information from DSpace developers, as well as a community in its own right. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From dsalo at library.wisc.edu Mon Aug 25 10:07:43 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Mon, 25 Aug 2008 09:07:43 -0500 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> Message-ID: <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> My answers: > What statistics do the following groups of > DSpace users need to see, and in what form are the statistics best > presented to them? > > Depositors At a minimum, I would like depositors to see the number of times an item's splash page has been visited, and the number of times each content bitstream (as distinct from e.g. thumbnails) has been downloaded. I would also like aggregate statistics available for each author in the system, though I recognize that this creates authority-control and role-evaluation issues. (For example, if Dr. Helen Troia is the author of articles in the repository, the editor of a journal whose backfiles are in the repository, as well as a thesis advisor for some theses in the thesis collection, the journal and the theses should NOT count toward her downloads.) HTML items (and similar aggregates, once we can work with them; e.g. Flash objects) cause trouble for bitstream analysis. To cut through the jungle, I suggest that only the primary bitstream have its accesses counted. If possible, it would be nice to count accesses for all HTML bitstreams, but that can be lived without if need be. I don't believe these statistics need to be real-time; a daily or even weekly cron-job would suffice. I do believe we need to take into account when an item was ingested, recognizing that older items will pile up the downloads over time. In addition to total-aggregates, then, I would recommend "in the last week," "in the last month," and "in the last year/since ingest" information. Per-calendar-year information should be kept and displayed indefinitely, even if the underlying data are eventually purged, because authors will use this in tenure-and-promotion packages. A sense of delta would be nice as well -- depositors would LOVE to know if suddenly an item's downloads spike. Other desiderata, less important: broad-brush geographic information (country of origin? Google Maps mashup?) for accesses, per-collection and per-community access counts (because it NEVER hurts to get a sense of competition going), search terms (in DSpace itself or from search engines) that land people at a particular item. > End-users (defined as "people examining items and downloading > bitstreams from a DSpace instance;" we may have to refine this further > in discussion) I think end-users can usefully be shown the per-item and per-bitstream information discussed above. They don't need to see per-author information -- or at the very least, authors should be able to decide whether to make this information public. (We do NOT want to embarrass anyone; that's a serious turnoff for our potential depositors.) > DSpace repository managers (as distinct from systems administrators) I get survey after survey asking for activity information on the repository. I can't answer them. To do so, I need download information on the whole repository. (Current JSPUI statistics offer an approximation to this, but I'm very leery of trusting it; I don't understand how it's calculated, and the numbers seem incredibly off to me.) I am sometimes asked about growth rate in accesses, so it would be useful to break this down by year. Some algorithm for breaking it down by amount of content in the repository ("downloads-per-item," where "item" would have to be some kind of average of items-in-repository over the period examined) would be useful as well. (And yes, I absolutely loathe those surveys too, but when they come from ARL, I don't have the luxury of ignoring them.) Some "wow" numbers would be useful for marketing purposes. A lot of what I've already described would do the trick there. I would also like to be able to track deposits per collection/community over time; this helps me know where to focus marketing and collection-development efforts, as well as helping me report progress to the appropriate administrators. (I run a system-wide repository, so I have to track deposits by campus; each campus has its own community.) > What else should developers keep in mind as they implement this feature? Search-engine crawlers. Excluding them provides a much more realistic sense of interest. We need to make clear this is happening, though, or we will be at a perceived disadvantage relative to repositories that don't strip out these accesses. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From mwood at IUPUI.Edu Mon Aug 25 10:55:20 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Mon, 25 Aug 2008 10:55:20 -0400 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> Message-ID: <20080825145520.GF15124@IUPUI.Edu> One thing to keep in mind about whole-site statistical tables is that there are already tools to do this for web sites in general, such as AWStats or Webalizer or whatever your favorite may be. We probably should not spend effort to try to duplicate those. Another consideration is that there are stat.s which would be useful anytime, and stat.s that you dream up once and may never use again, or may only find interesting at irregular intervals. So I think we should be careful not to try to do too much ourselves. We can have some generally-useful stuff built in, but we also need ways to expose the raw cases in a useful form for ad-hoc analysis with general-purpose statistical tools (SPSS/BMD/SAS/Stata/R/whatever). Stuff to be inserted as one component of e.g. an item page probably needs to be built in. Stuff that would be a page on its own should perhaps not be part of DSpace at all, but rather something we make easy to do with other tools. We need to keep clearly in mind the distinction between capturing raw cases (someone fetched a bitstream) and abstracting useful patterns from the collected cases (frequency histogram of this collection's fetches over time, last month's fetches broken down by nation of origin). What might be helpful is to provide some views or stored procedures that stat. tools could use to classify observations. Such tools usually have good facilities for poking around in databases, but could perhaps use help in getting the information they need without having to understand (and track changes to!) the fulness of DSpace's schema. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080825/1477891f/attachment.bin From l.hayes at auckland.ac.nz Mon Aug 25 18:50:18 2008 From: l.hayes at auckland.ac.nz (Leonie Hayes) Date: Tue, 26 Aug 2008 10:50:18 +1200 Subject: [Dspace-general] Statistics In-Reply-To: References: Message-ID: Dear DSpace Community Statistics 1. From a what works perspective there is already beautiful statistics implementations addressing the minimum requirements, I think the IDEALS repository has what I would be very happy with, these guys seem to be one step ahead http://www.ideals.uiuc.edu I can remember asking Tim Donohue about their implementation a few years ago, he said it was a very customised solution, please correct me if wrong. I also find the eprints and Fez Fedora stats are pretty good. 2. Develop a package that delivers both via the JSP and XML Manakin interface. 3. Keep it fairly compartmentalised/simple? if possible and quarantine the requirements into 3 distinct areas a) Item Statistics - downloads with other additional extras like authors and collections b) Site Trends - traffic sources, countries etc piggy back on tools like Google Analytics, or other web analyser tools that Mark Wood mentions c) More complex reporting that meets a specific requirements. Many thanks for the opportunity to be part of the discussion, we are very isolated in New Zealand but struggling with all the same problems everyone else is experiencing... it helps to move forward. Time zones don't allow any online interaction it will be 4am here. Leonie Hayes Research Repository Librarian http://www.library.auckland.ac.nz/contacts/?firstname=&lastname=hayes http://researchspace.auckland.ac.nz -----Original Message----- From: dspace-general-bounces at mit.edu [mailto:dspace-general-bounces at mit.edu] On Behalf Of dspace-general-request at mit.edu Sent: Tuesday, 26 August 2008 4:03 a.m. To: dspace-general at mit.edu Subject: Dspace-general Digest, Vol 61, Issue 19 Send Dspace-general mailing list submissions to dspace-general at mit.edu To subscribe or unsubscribe via the World Wide Web, visit http://mailman.mit.edu/mailman/listinfo/dspace-general or, via email, send a message with subject or body 'help' to dspace-general-request at mit.edu You can reach the person managing the list at dspace-general-owner at mit.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of Dspace-general digest..." Today's Topics: 1. Week 2: Statistics (Dorothea Salo) 2. Re: Week 2: Statistics (Dorothea Salo) 3. Re: Week 2: Statistics (Mark H. Wood) ---------------------------------------------------------------------- Message: 1 Date: Mon, 25 Aug 2008 08:08:47 -0500 From: "Dorothea Salo" Subject: [Dspace-general] Week 2: Statistics To: dspace , "DSpace Tech-List" Message-ID: <356cf3980808250608t689c84d8uc7d7f69155a76ece at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 Greetings, DSpace community, I want to thank everyone once again for last week's stimulating discussion and impressive chat turnout! I have a new question for everyone this week, pursuant to some discussion on the lists: "Statistics" are one of the commonest requests for a new DSpace feature. Without further specification, however, it's hard to know what data to present, since there are no standards or even clear best practices in this area. What statistics do the following groups of DSpace users need to see, and in what form are the statistics best presented to them? Depositors End-users (defined as "people examining items and downloading bitstreams from a DSpace instance;" we may have to refine this further in discussion) DSpace repository managers (as distinct from systems administrators) What else should developers keep in mind as they implement this feature? Because it would be nice to reach a working consensus on this (unlike last week's question, which was intended to pull out as broad a selection of needs as possible), I think we should start discussing immediately. I encourage all respondents to respond TO THE MAILING LIST instead of to me. I will be holding another chat to discuss the weekly question. It will take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on irc.freenode.net. I apologize to West Coast (USA) community members for last week's unconscionably early hour; we'll try 10 am US Central (11 am Eastern, 4 pm GMT) this week, and we may go even later next week if our European community members can stand it. For those who don't normally use IRC, there are two easy web gateways. One is mibbit.com; the other is specific to our channel and can be found at . I encourage all of us to become familiar with the channel; it is a source of real-time technical information from DSpace developers, as well as a community in its own right. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 ------------------------------ Message: 2 Date: Mon, 25 Aug 2008 09:07:43 -0500 From: "Dorothea Salo" Subject: Re: [Dspace-general] Week 2: Statistics To: dspace Message-ID: <356cf3980808250707n5d45ec1vbd607ddcac148e27 at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 My answers: > What statistics do the following groups of > DSpace users need to see, and in what form are the statistics best > presented to them? > > Depositors At a minimum, I would like depositors to see the number of times an item's splash page has been visited, and the number of times each content bitstream (as distinct from e.g. thumbnails) has been downloaded. I would also like aggregate statistics available for each author in the system, though I recognize that this creates authority-control and role-evaluation issues. (For example, if Dr. Helen Troia is the author of articles in the repository, the editor of a journal whose backfiles are in the repository, as well as a thesis advisor for some theses in the thesis collection, the journal and the theses should NOT count toward her downloads.) HTML items (and similar aggregates, once we can work with them; e.g. Flash objects) cause trouble for bitstream analysis. To cut through the jungle, I suggest that only the primary bitstream have its accesses counted. If possible, it would be nice to count accesses for all HTML bitstreams, but that can be lived without if need be. I don't believe these statistics need to be real-time; a daily or even weekly cron-job would suffice. I do believe we need to take into account when an item was ingested, recognizing that older items will pile up the downloads over time. In addition to total-aggregates, then, I would recommend "in the last week," "in the last month," and "in the last year/since ingest" information. Per-calendar-year information should be kept and displayed indefinitely, even if the underlying data are eventually purged, because authors will use this in tenure-and-promotion packages. A sense of delta would be nice as well -- depositors would LOVE to know if suddenly an item's downloads spike. Other desiderata, less important: broad-brush geographic information (country of origin? Google Maps mashup?) for accesses, per-collection and per-community access counts (because it NEVER hurts to get a sense of competition going), search terms (in DSpace itself or from search engines) that land people at a particular item. > End-users (defined as "people examining items and downloading > bitstreams from a DSpace instance;" we may have to refine this further > in discussion) I think end-users can usefully be shown the per-item and per-bitstream information discussed above. They don't need to see per-author information -- or at the very least, authors should be able to decide whether to make this information public. (We do NOT want to embarrass anyone; that's a serious turnoff for our potential depositors.) > DSpace repository managers (as distinct from systems administrators) I get survey after survey asking for activity information on the repository. I can't answer them. To do so, I need download information on the whole repository. (Current JSPUI statistics offer an approximation to this, but I'm very leery of trusting it; I don't understand how it's calculated, and the numbers seem incredibly off to me.) I am sometimes asked about growth rate in accesses, so it would be useful to break this down by year. Some algorithm for breaking it down by amount of content in the repository ("downloads-per-item," where "item" would have to be some kind of average of items-in-repository over the period examined) would be useful as well. (And yes, I absolutely loathe those surveys too, but when they come from ARL, I don't have the luxury of ignoring them.) Some "wow" numbers would be useful for marketing purposes. A lot of what I've already described would do the trick there. I would also like to be able to track deposits per collection/community over time; this helps me know where to focus marketing and collection-development efforts, as well as helping me report progress to the appropriate administrators. (I run a system-wide repository, so I have to track deposits by campus; each campus has its own community.) > What else should developers keep in mind as they implement this feature? Search-engine crawlers. Excluding them provides a much more realistic sense of interest. We need to make clear this is happening, though, or we will be at a perceived disadvantage relative to repositories that don't strip out these accesses. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 ------------------------------ Message: 3 Date: Mon, 25 Aug 2008 10:55:20 -0400 From: "Mark H. Wood" Subject: Re: [Dspace-general] Week 2: Statistics To: dspace-general at mit.edu Message-ID: <20080825145520.GF15124 at IUPUI.Edu> Content-Type: text/plain; charset="us-ascii" One thing to keep in mind about whole-site statistical tables is that there are already tools to do this for web sites in general, such as AWStats or Webalizer or whatever your favorite may be. We probably should not spend effort to try to duplicate those. Another consideration is that there are stat.s which would be useful anytime, and stat.s that you dream up once and may never use again, or may only find interesting at irregular intervals. So I think we should be careful not to try to do too much ourselves. We can have some generally-useful stuff built in, but we also need ways to expose the raw cases in a useful form for ad-hoc analysis with general-purpose statistical tools (SPSS/BMD/SAS/Stata/R/whatever). Stuff to be inserted as one component of e.g. an item page probably needs to be built in. Stuff that would be a page on its own should perhaps not be part of DSpace at all, but rather something we make easy to do with other tools. We need to keep clearly in mind the distinction between capturing raw cases (someone fetched a bitstream) and abstracting useful patterns from the collected cases (frequency histogram of this collection's fetches over time, last month's fetches broken down by nation of origin). What might be helpful is to provide some views or stored procedures that stat. tools could use to classify observations. Such tools usually have good facilities for poking around in databases, but could perhaps use help in getting the information they need without having to understand (and track changes to!) the fulness of DSpace's schema. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080825/147 7891f/attachment-0001.bin ------------------------------ _______________________________________________ Dspace-general mailing list Dspace-general at mit.edu http://mailman.mit.edu/mailman/listinfo/dspace-general End of Dspace-general Digest, Vol 61, Issue 19 ********************************************** From bram at mire.be Mon Aug 25 19:23:39 2008 From: bram at mire.be (Bram Luyten) Date: Tue, 26 Aug 2008 01:23:39 +0200 Subject: [Dspace-general] [Dspace-tech] Week 2: Statistics In-Reply-To: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> Message-ID: Dear Dorothea, inspiring question ! There's a huge range of interesting options to explore in the area of statistics, measurement and repository usage tracking. Following ideas could be relevant to end-users: It might be interesting for authors if they are able to see which dspace (or google) search queries, lead to his items. This could be displayed as the top ten of most popular searches, that lead to a specific item. If download per bitstream, and splash page visits are being tracked, it might be useful if they could be used to display different rankings//lists Rankings by collection or community & the possibility to locate a certain item in those rankings. Use Case: if you can see that your item is one of the best "performing" items in your collection, you might be interested how it performs in the context of the above lying community. Order by hits or downloads, for "items-by-author" Use Case: if you found an interesting author, with a lot of papers relevant in your context, you might want to start off with his most popular items. with kindest regards, Bram Luyten On Mon, Aug 25, 2008 at 3:08 PM, Dorothea Salo wrote: > Greetings, DSpace community, > > I want to thank everyone once again for last week's stimulating > discussion and impressive chat turnout! I have a new question for > everyone this week, pursuant to some discussion on the lists: > > "Statistics" are one of the commonest requests for a new DSpace > feature. Without further specification, however, it's hard to know > what data to present, since there are no standards or even clear best > practices in this area. What statistics do the following groups of > DSpace users need to see, and in what form are the statistics best > presented to them? > > Depositors > End-users (defined as "people examining items and downloading > bitstreams from a DSpace instance;" we may have to refine this further > in discussion) > DSpace repository managers (as distinct from systems administrators) > > What else should developers keep in mind as they implement this feature? > > Because it would be nice to reach a working consensus on this (unlike > last week's question, which was intended to pull out as broad a > selection of needs as possible), I think we should start discussing > immediately. I encourage all respondents to respond TO THE MAILING > LIST instead of to me. > > I will be holding another chat to discuss the weekly question. It will > take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on > irc.freenode.net. I apologize to West Coast (USA) community members > for last week's unconscionably early hour; we'll try 10 am US Central > (11 am Eastern, 4 pm GMT) this week, and we may go even later next > week if our European community members can stand it. > > For those who don't normally use IRC, there are two easy web gateways. > One is mibbit.com; the other is specific to our channel and can be > found at . I encourage > all of us to become familiar with the channel; it is a source of > real-time technical information from DSpace developers, as well as a > community in its own right. > > Dorothea > > -- > Dorothea Salo dsalo at library.wisc.edu > Digital Repository Librarian AIM: mindsatuw > University of Wisconsin > Rm 218, Memorial Library > (608) 262-5493 > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > DSpace-tech mailing list > DSpace-tech at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech > -- @mire NV Romeinse Straat 18 3001 Heverlee Belgium +32 2 888 29 56 http://www.atmire.com - Institutional Repository Solutions http://www.togather.eu - Before getting together, get Tog at ther -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/877dc355/attachment.htm From dsalo at library.wisc.edu Tue Aug 26 10:44:45 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Tue, 26 Aug 2008 09:44:45 -0500 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <20080825145520.GF15124@IUPUI.Edu> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> Message-ID: <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> 2008/8/25 Mark H. Wood : > One thing to keep in mind about whole-site statistical tables is that > there are already tools to do this for web sites in general, such as > AWStats or Webalizer or whatever your favorite may be. We probably > should not spend effort to try to duplicate those. Perhaps not, but if this is the direction we want people to go in, we probably ought to document how to do it, at least informally on the wiki. Does anybody have such a system in place? > We can have > some generally-useful stuff built in, but we also need ways to expose > the raw cases in a useful form for ad-hoc analysis with > general-purpose statistical tools (SPSS/BMD/SAS/Stata/R/whatever). +1 Data for mashup is always good. I should have mentioned that a desire I have is the ability to export/transclude at LEAST by-author stats data for inclusion other places. > Stuff to be inserted as one component of e.g. an item page probably > needs to be built in. Stuff that would be a page on its own should > perhaps not be part of DSpace at all, but rather something we make > easy to do with other tools. I'm not sure this is the distinction I would make. To me, the question is whether a given set of statistics needs to know anything specific about the way DSpace structures the universe. So I might well have special pages outside DSpace containing DSpace by-author statistics, but it's impossible (isn't it?) to tweak a Webalizer install into capturing stats by author. I still need to rely on DSpace to carve up the accesses correctly. > We need to keep clearly in mind the distinction between capturing raw > cases (someone fetched a bitstream) and abstracting useful patterns > from the collected cases (frequency histogram of this collection's > fetches over time, last month's fetches broken down by nation of > origin). Well, developers do. End-users, perhaps not so much. :) > What might be helpful is to provide some views or stored procedures > that stat. tools could use to classify observations. Such tools > usually have good facilities for poking around in databases, but could > perhaps use help in getting the information they need without having to > understand (and track changes to!) the fulness of DSpace's schema. Interesting. Where would this leave the average repository manager who isn't using Stata, but just wants some numbers to show people? Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From tdonohue at illinois.edu Tue Aug 26 11:07:43 2008 From: tdonohue at illinois.edu (Tim Donohue) Date: Tue, 26 Aug 2008 10:07:43 -0500 Subject: [Dspace-general] Statistics In-Reply-To: References: Message-ID: <48B41C3F.3010709@illinois.edu> All, Just a comment on Leonie's praise of the Statistics we are using for IDEALS (www.ideals.uiuc.edu): Leonie Hayes wrote: > Dear DSpace Community > > Statistics > > 1. From a what works perspective there is already beautiful statistics > implementations addressing the minimum requirements, I think the IDEALS > repository has what I would be very happy with, these guys seem to be > one step ahead http://www.ideals.uiuc.edu I can remember asking Tim > Donohue about their implementation a few years ago, he said it was a > very customised solution, please correct me if wrong. I also find the > eprints and Fez Fedora stats are pretty good. Thanks for the praise...much appreciated! :) Though, some of the kudos should go to U of Rochester (http://urresearch.rochester.edu/), who initially created the Statistics package we use for DSpace. We've made some local modifications (like the "Top 10 Downloads" list), but much of the original work was done at U of Rochester. However, it's worth mentioning to all that although the statistics we are using for IDEALS look "pretty", there's still quite a bit of "ugliness" underneath. The main problem we have is that our statistics package does *NOT* automatically filter out web-crawlers like Google/Yahoo. Instead, it requires a person to go in and manually filter out downloads (via IP address) which look to be web-crawlers. It's definitely *not* a solution that scales well. So, although I think it was already mentioned, I'd add as a requirement for a good Statistics Package: * Must filter out web-crawlers in a semi-automated fashion! - Tim -- Tim Donohue Research Programmer, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) University of Illinois at Urbana-Champaign tdonohue at illinois.edu | (217) 333-4648 From tdonohue at illinois.edu Tue Aug 26 12:09:15 2008 From: tdonohue at illinois.edu (Tim Donohue) Date: Tue, 26 Aug 2008 11:09:15 -0500 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> Message-ID: <48B42AAB.6010804@illinois.edu> Dorothea & all, Dorothea Salo wrote: > 2008/8/25 Mark H. Wood : >> One thing to keep in mind about whole-site statistical tables is that >> there are already tools to do this for web sites in general, such as >> AWStats or Webalizer or whatever your favorite may be. We probably >> should not spend effort to try to duplicate those. > > Perhaps not, but if this is the direction we want people to go in, we > probably ought to document how to do it, at least informally on the > wiki. Does anybody have such a system in place? For IDEALS (www.ideals.uiuc.edu), we use AWStats to get site-wide traffic information. However, that information is *not* publicly accessible. We only use it for administrative purposes, since most of the information AWStats generates for us is generally *not* useful to our users. So, for example, AWStats can provide us with the following general information: * Which features of DSpace are being used most frequently (e.g. Subject Browse, Community/Collection browse, search, etc.) * Which web browsers our users are using * # of overall hits in a given month,week,day,hour * Approximate amount of time users spend on our site * What external resources people use to get to our site (e.g. Google, Blog posts, Library website, etc.) * The top searches used to get to your site (in Google, Yahoo, MSN, etc) But, AWStats only works at a global level. So, it *cannot* give us any real information at a community, collection or item level, since it doesn't understand DSpace's internal structure and cannot parse DSpace's log files (it parses the *web server* log files, rather than DSpace's internal logs) So, in the end, AWStats is a worthwhile tool to keep in mind. However, without some major customizations specific to DSpace, it's really more of an Administrative tool to help you determine *how* users are using your site. It doesn't give any real worthwhile "statistics" in terms of file downloads or individual community/collection access counts, which are more likely to be useful to your users. - Tim -- Tim Donohue Research Programmer, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) University of Illinois at Urbana-Champaign tdonohue at illinois.edu | (217) 333-4648 From mwood at IUPUI.Edu Tue Aug 26 15:47:20 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Tue, 26 Aug 2008 15:47:20 -0400 Subject: [Dspace-general] Statistics In-Reply-To: <48B41C3F.3010709@illinois.edu> References: <48B41C3F.3010709@illinois.edu> Message-ID: <20080826194720.GA20164@IUPUI.Edu> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote: > So, although I think it was already mentioned, I'd add as a requirement > for a good Statistics Package: > > * Must filter out web-crawlers in a semi-automated fashion! +1! Suggestions as to how? The Rochester mod.s could be augmented to filter out the easiest cases more simply. Some well-behaved crawlers can be spotted automatically. (No, I don't recall how.) The filter rules could be made more flexible than just a single type of fixed-size netblocks (if memory serves). I've been meaning to work on these at some point, but haven't yet reached That Point. Crawler filtering sounds like something that might be abstracted from the various existing stat. patches and provided as a common service. We all should invent this wheel only once. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/7dddd0c1/attachment.bin From dsalo at library.wisc.edu Tue Aug 26 16:09:16 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Tue, 26 Aug 2008 15:09:16 -0500 Subject: [Dspace-general] Statistics In-Reply-To: <20080826194720.GA20164@IUPUI.Edu> References: <48B41C3F.3010709@illinois.edu> <20080826194720.GA20164@IUPUI.Edu> Message-ID: <356cf3980808261309j1a9964adif49b5ecefe5b98fe@mail.gmail.com> 2008/8/26 Mark H. Wood : > On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote: >> So, although I think it was already mentioned, I'd add as a requirement >> for a good Statistics Package: >> >> * Must filter out web-crawlers in a semi-automated fashion! > > +1! Suggestions as to how? The site maintains a list of user-agents, classified by type. They have an XML-downloadable version at , as well as an RSS-feed updater. Perhaps polling this would be a useful starting point? Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From tdonohue at illinois.edu Tue Aug 26 16:29:23 2008 From: tdonohue at illinois.edu (Tim Donohue) Date: Tue, 26 Aug 2008 15:29:23 -0500 Subject: [Dspace-general] Statistics In-Reply-To: <356cf3980808261309j1a9964adif49b5ecefe5b98fe@mail.gmail.com> References: <48B41C3F.3010709@illinois.edu> <20080826194720.GA20164@IUPUI.Edu> <356cf3980808261309j1a9964adif49b5ecefe5b98fe@mail.gmail.com> Message-ID: <48B467A3.7080100@illinois.edu> Dorothea Salo wrote: > 2008/8/26 Mark H. Wood : >> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote: >>> So, although I think it was already mentioned, I'd add as a requirement >>> for a good Statistics Package: >>> >>> * Must filter out web-crawlers in a semi-automated fashion! >> +1! Suggestions as to how? > > The site maintains a list of > user-agents, classified by type. They have an XML-downloadable version > at , as well as an RSS-feed > updater. Perhaps polling this would be a useful starting point? > > Dorothea > Universidade of Minho's Statistics Add-On for DSpace can do some basic automated filtering of web crawlers: See its list of main features on the DSpace Wiki: http://wiki.dspace.org/index.php//StatisticsAddOn (It looks like they determine spiders by how spiders tend to identify themselves. Most "nice" spiders, like Google, will identify themselves in a common fashion, e.g. "Googlebot") Frankly, although our statistics for IDEALS are nice looking...Minho's work is much more extensive and offers a greater variety of features (from what I've seen/heard of it). It's just missing our "Top 10 Downloads" list :) - Tim -- Tim Donohue Research Programmer, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) University of Illinois at Urbana-Champaign tdonohue at illinois.edu | (217) 333-4648 From mwood at IUPUI.Edu Tue Aug 26 16:34:33 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Tue, 26 Aug 2008 16:34:33 -0400 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> Message-ID: <20080826203433.GB20164@IUPUI.Edu> On Tue, Aug 26, 2008 at 09:44:45AM -0500, Dorothea Salo wrote: > 2008/8/25 Mark H. Wood : > > What might be helpful is to provide some views or stored procedures > > that stat. tools could use to classify observations. Such tools > > usually have good facilities for poking around in databases, but could > > perhaps use help in getting the information they need without having to > > understand (and track changes to!) the fulness of DSpace's schema. > > Interesting. Where would this leave the average repository manager who > isn't using Stata, but just wants some numbers to show people? Well, it depends on which numbers are wanted. I do think there will be some reports that are popular enough, and easy enough to get right, that they should be built in. The support for external tools would be aimed at people who do want to use them. What sort of data would be useful to the manager who isn't into heavy statistical analysis, which aren't likely to be provided as built-ins? Where I'm going is: o The realm of reasonable possibilities for statistical analysis and presentation of DSpace activity is rather huge; o people who understand statistical processing have already figured out the hard parts of analysis and presentation; o the tail should not be allowed to wag the dog -- we want statistics, but that's subordinate to building excellend document repository software. Part of, important, but in a supporting role. So I am hoping that we can mostly satisfy most people with relatively modest built-in statistical support, and take care of the other cases with modest support for the development of external reporting mechanisms. This being a community, I imagine that some will develop external solutions that they can share. This is one reason why I think that it should be as easy as possible for multiple stat. projects to tap into built-in streams of observations. Different sites have different needs, and I think we need to be able to easily play with various ways of doing stat.s. I'm not convinced that we are going to understand the need sufficiently without getting into the field a selection of solutions that can be easily snapped in and tried by a sizable number of sites. There are a number of good attempts now, but it's not easy to install them and that limits the amount of experience we can gather. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/d1873706/attachment.bin From dsalo at library.wisc.edu Tue Aug 26 19:13:14 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Tue, 26 Aug 2008 18:13:14 -0500 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <20080826203433.GB20164@IUPUI.Edu> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> <20080826203433.GB20164@IUPUI.Edu> Message-ID: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> 2008/8/26 Mark H. Wood : > Well, it depends on which numbers are wanted. I do think there will > be some reports that are popular enough, and easy enough to get right, > that they should be built in. The support for external tools would be > aimed at people who do want to use them. What sort of data would be > useful to the manager who isn't into heavy statistical analysis, which > aren't likely to be provided as built-ins? Well, I hope that's where the discussion this week has been pointing. If not, we'll have to find a different way to gather that information. Looking at existing implementations of statistics (e.g. EPrints, SSRN) might be a start. > o the tail should not be allowed to wag the dog -- we want > statistics, but that's subordinate to building excellent document > repository software. Part of, important, but in a supporting role. This is such an interesting statement that I think I will make it next week's topic! What *is* excellent document repository software? I have a feeling that the non-developer community may have a rather different take on it from most developers... we'll see if I'm right. > So I am hoping that we can mostly satisfy most people with relatively > modest built-in statistical support, and take care of the other cases > with modest support for the development of external reporting > mechanisms. I'd be interested to know how the proposals that have been put forward this week place on a modesty scale. Developers? > This is one reason why I think that it should be as easy as possible > for multiple stat. projects to tap into built-in streams of > observations. Different sites have different needs, and I think we > need to be able to easily play with various ways of doing stat.s. Agreed, but just to toss this out: I foresee a countervailing pressure in future toward standardized and aggregated statistics across repositories. I have heard a number of statements to the effect that faculty are using download counts from disciplinary repositories in tenure-and-promotion packages. As their work becomes scattered and/or duplicated across various repositories, they're going to want to aggregate that information. > There are a > number of good attempts now, but it's not easy to install them and > that limits the amount of experience we can gather. +1. This is a problem for more than just statistics! Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From christophe.dupriez at destin.be Wed Aug 27 04:37:12 2008 From: christophe.dupriez at destin.be (Christophe Dupriez) Date: Wed, 27 Aug 2008 10:37:12 +0200 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> Message-ID: <48B51238.4010008@destin.be> Hi Dorothea and participants to this discussion! I would like to say that statistics are there for different purposes: 1) detect errors (why nobody looked at my site last sunday?) 2) provide KPI (Key Performance Indicators), measures that a manager follows on the medium term to take organisational decisions 3) investigate new hypothesis before investing to change the organisation. For purpose (3), by essence, you need to "open" to analysis the detailed logs of the events and the data stored in DSpace. Generic programs like SAS or reports generators are the best to dig in data and answer to new, unforeseen questions. Everybody in the community will be happy to have this "back door" available. For purpose (2), we need to know what KPIs are needed by IR managers. I will go further, new IRs and their managers would be very happy not to reinvent KPIs and to have good ones already proposed to sustain a documented IR development process. A very big part of DSpace attractiveness is (and should be implemented really!) that it provides "best practices" for IR management (and not only computing). For purpose (2), Use cases, practices, measures must be designed upfront. It will contribute strongly to the overall specifications of DSpace. For purpose (1), a more formal, bottom up, data driven approach may be sufficient to install validation tools (like the checksum checker) to ensure that DSpace operations are "in line". So we have no choice: we have to listen IR managers (please come by!) to know the good practices DSpace must support... Have a nice day! Christophe (peeking on the list when I should not during my holidays!) Dorothea Salo a ?crit : > Greetings, DSpace community, > > I want to thank everyone once again for last week's stimulating > discussion and impressive chat turnout! I have a new question for > everyone this week, pursuant to some discussion on the lists: > > "Statistics" are one of the commonest requests for a new DSpace > feature. Without further specification, however, it's hard to know > what data to present, since there are no standards or even clear best > practices in this area. What statistics do the following groups of > DSpace users need to see, and in what form are the statistics best > presented to them? > > Depositors > End-users (defined as "people examining items and downloading > bitstreams from a DSpace instance;" we may have to refine this further > in discussion) > DSpace repository managers (as distinct from systems administrators) > > What else should developers keep in mind as they implement this feature? > > Because it would be nice to reach a working consensus on this (unlike > last week's question, which was intended to pull out as broad a > selection of needs as possible), I think we should start discussing > immediately. I encourage all respondents to respond TO THE MAILING > LIST instead of to me. > > I will be holding another chat to discuss the weekly question. It will > take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on > irc.freenode.net. I apologize to West Coast (USA) community members > for last week's unconscionably early hour; we'll try 10 am US Central > (11 am Eastern, 4 pm GMT) this week, and we may go even later next > week if our European community members can stand it. > > For those who don't normally use IRC, there are two easy web gateways. > One is mibbit.com; the other is specific to our channel and can be > found at . I encourage > all of us to become familiar with the channel; it is a source of > real-time technical information from DSpace developers, as well as a > community in its own right. > > Dorothea > > -------------- next part -------------- A non-text attachment was scrubbed... Name: christophe_dupriez.vcf Type: text/x-vcard Size: 454 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080827/3784f643/attachment.vcf From eloy at sdum.uminho.pt Wed Aug 27 05:12:04 2008 From: eloy at sdum.uminho.pt (Eloy Rodrigues) Date: Wed, 27 Aug 2008 10:12:04 +0100 Subject: [Dspace-general] Dspace-general Digest, Vol 61, Issue 22 In-Reply-To: References: Message-ID: <00a701c90824$f8d80000$ea880000$@uminho.pt> Dear All, A detailed description of the functionality and architecture of the statistics Add-on we have developed can be found on the docs folder of the downloadable file - http://wiki.dspace.org/static_files/6/68/Stats-addon-2.0.tar.gz On our production implementation of the Add-on on RepositoriUM, we have developed some more tools/functionality for automated and semi-automated detection and exclusion of crawlers (not only based in "well behaved" robots, but also on the patterns and behavior from IP addresses, etc.), that are not available in the version 2.0 of the Add-on. As we are currently upgrading Reposit?riUM to DSpace 1.5, hopefully we will release a Stats Add-on 2.1, compatible with DSpace 1.5, and including the new functionality/tools in late September or October. Best Regards, Eloy Rodrigues Universidade do Minho - Servi?os de Documenta??o Campus de Gualtar - 4710 - 057 Braga Telefone: + 351 253604150; Fax: + 351 253604159 Campus de Azur?m - 4800 - 058 Guimar?es Telefone: + 351 253510168; Fax: + 351 253510117 -----Original Message----- From: dspace-general-bounces at mit.edu [mailto:dspace-general-bounces at mit.edu] On Behalf Of dspace-general-request at mit.edu Sent: quarta-feira, 27 de Agosto de 2008 09:31 To: dspace-general at mit.edu Subject: Dspace-general Digest, Vol 61, Issue 22 Send Dspace-general mailing list submissions to dspace-general at mit.edu To subscribe or unsubscribe via the World Wide Web, visit http://mailman.mit.edu/mailman/listinfo/dspace-general or, via email, send a message with subject or body 'help' to dspace-general-request at mit.edu You can reach the person managing the list at dspace-general-owner at mit.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of Dspace-general digest..." Today's Topics: 1. Re: Week 2: Statistics (Tim Donohue) 2. Re: Statistics (Mark H. Wood) 3. Re: Statistics (Dorothea Salo) 4. Re: Statistics (Tim Donohue) 5. Re: Week 2: Statistics (Mark H. Wood) 6. Re: Week 2: Statistics (Dorothea Salo) 7. Re: Week 2: Statistics (Christophe Dupriez) ---------------------------------------------------------------------- Message: 1 Date: Tue, 26 Aug 2008 11:09:15 -0500 From: Tim Donohue Subject: Re: [Dspace-general] Week 2: Statistics To: Dorothea Salo Cc: dspace-general at mit.edu Message-ID: <48B42AAB.6010804 at illinois.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dorothea & all, Dorothea Salo wrote: > 2008/8/25 Mark H. Wood : >> One thing to keep in mind about whole-site statistical tables is that >> there are already tools to do this for web sites in general, such as >> AWStats or Webalizer or whatever your favorite may be. We probably >> should not spend effort to try to duplicate those. > > Perhaps not, but if this is the direction we want people to go in, we > probably ought to document how to do it, at least informally on the > wiki. Does anybody have such a system in place? For IDEALS (www.ideals.uiuc.edu), we use AWStats to get site-wide traffic information. However, that information is *not* publicly accessible. We only use it for administrative purposes, since most of the information AWStats generates for us is generally *not* useful to our users. So, for example, AWStats can provide us with the following general information: * Which features of DSpace are being used most frequently (e.g. Subject Browse, Community/Collection browse, search, etc.) * Which web browsers our users are using * # of overall hits in a given month,week,day,hour * Approximate amount of time users spend on our site * What external resources people use to get to our site (e.g. Google, Blog posts, Library website, etc.) * The top searches used to get to your site (in Google, Yahoo, MSN, etc) But, AWStats only works at a global level. So, it *cannot* give us any real information at a community, collection or item level, since it doesn't understand DSpace's internal structure and cannot parse DSpace's log files (it parses the *web server* log files, rather than DSpace's internal logs) So, in the end, AWStats is a worthwhile tool to keep in mind. However, without some major customizations specific to DSpace, it's really more of an Administrative tool to help you determine *how* users are using your site. It doesn't give any real worthwhile "statistics" in terms of file downloads or individual community/collection access counts, which are more likely to be useful to your users. - Tim -- Tim Donohue Research Programmer, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) University of Illinois at Urbana-Champaign tdonohue at illinois.edu | (217) 333-4648 ------------------------------ Message: 2 Date: Tue, 26 Aug 2008 15:47:20 -0400 From: "Mark H. Wood" Subject: Re: [Dspace-general] Statistics To: dspace-general at mit.edu Message-ID: <20080826194720.GA20164 at IUPUI.Edu> Content-Type: text/plain; charset="us-ascii" On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote: > So, although I think it was already mentioned, I'd add as a requirement > for a good Statistics Package: > > * Must filter out web-crawlers in a semi-automated fashion! +1! Suggestions as to how? The Rochester mod.s could be augmented to filter out the easiest cases more simply. Some well-behaved crawlers can be spotted automatically. (No, I don't recall how.) The filter rules could be made more flexible than just a single type of fixed-size netblocks (if memory serves). I've been meaning to work on these at some point, but haven't yet reached That Point. Crawler filtering sounds like something that might be abstracted from the various existing stat. patches and provided as a common service. We all should invent this wheel only once. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/7dddd0c 1/attachment-0001.bin ------------------------------ Message: 3 Date: Tue, 26 Aug 2008 15:09:16 -0500 From: "Dorothea Salo" Subject: Re: [Dspace-general] Statistics To: dspace-general at mit.edu Message-ID: <356cf3980808261309j1a9964adif49b5ecefe5b98fe at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 2008/8/26 Mark H. Wood : > On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote: >> So, although I think it was already mentioned, I'd add as a requirement >> for a good Statistics Package: >> >> * Must filter out web-crawlers in a semi-automated fashion! > > +1! Suggestions as to how? The site maintains a list of user-agents, classified by type. They have an XML-downloadable version at , as well as an RSS-feed updater. Perhaps polling this would be a useful starting point? Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 ------------------------------ Message: 4 Date: Tue, 26 Aug 2008 15:29:23 -0500 From: Tim Donohue Subject: Re: [Dspace-general] Statistics To: Dorothea Salo Cc: dspace-general at mit.edu Message-ID: <48B467A3.7080100 at illinois.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dorothea Salo wrote: > 2008/8/26 Mark H. Wood : >> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote: >>> So, although I think it was already mentioned, I'd add as a requirement >>> for a good Statistics Package: >>> >>> * Must filter out web-crawlers in a semi-automated fashion! >> +1! Suggestions as to how? > > The site maintains a list of > user-agents, classified by type. They have an XML-downloadable version > at , as well as an RSS-feed > updater. Perhaps polling this would be a useful starting point? > > Dorothea > Universidade of Minho's Statistics Add-On for DSpace can do some basic automated filtering of web crawlers: See its list of main features on the DSpace Wiki: http://wiki.dspace.org/index.php//StatisticsAddOn (It looks like they determine spiders by how spiders tend to identify themselves. Most "nice" spiders, like Google, will identify themselves in a common fashion, e.g. "Googlebot") Frankly, although our statistics for IDEALS are nice looking...Minho's work is much more extensive and offers a greater variety of features (from what I've seen/heard of it). It's just missing our "Top 10 Downloads" list :) - Tim -- Tim Donohue Research Programmer, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) University of Illinois at Urbana-Champaign tdonohue at illinois.edu | (217) 333-4648 ------------------------------ Message: 5 Date: Tue, 26 Aug 2008 16:34:33 -0400 From: "Mark H. Wood" Subject: Re: [Dspace-general] Week 2: Statistics To: dspace-general at mit.edu Message-ID: <20080826203433.GB20164 at IUPUI.Edu> Content-Type: text/plain; charset="us-ascii" On Tue, Aug 26, 2008 at 09:44:45AM -0500, Dorothea Salo wrote: > 2008/8/25 Mark H. Wood : > > What might be helpful is to provide some views or stored procedures > > that stat. tools could use to classify observations. Such tools > > usually have good facilities for poking around in databases, but could > > perhaps use help in getting the information they need without having to > > understand (and track changes to!) the fulness of DSpace's schema. > > Interesting. Where would this leave the average repository manager who > isn't using Stata, but just wants some numbers to show people? Well, it depends on which numbers are wanted. I do think there will be some reports that are popular enough, and easy enough to get right, that they should be built in. The support for external tools would be aimed at people who do want to use them. What sort of data would be useful to the manager who isn't into heavy statistical analysis, which aren't likely to be provided as built-ins? Where I'm going is: o The realm of reasonable possibilities for statistical analysis and presentation of DSpace activity is rather huge; o people who understand statistical processing have already figured out the hard parts of analysis and presentation; o the tail should not be allowed to wag the dog -- we want statistics, but that's subordinate to building excellend document repository software. Part of, important, but in a supporting role. So I am hoping that we can mostly satisfy most people with relatively modest built-in statistical support, and take care of the other cases with modest support for the development of external reporting mechanisms. This being a community, I imagine that some will develop external solutions that they can share. This is one reason why I think that it should be as easy as possible for multiple stat. projects to tap into built-in streams of observations. Different sites have different needs, and I think we need to be able to easily play with various ways of doing stat.s. I'm not convinced that we are going to understand the need sufficiently without getting into the field a selection of solutions that can be easily snapped in and tried by a sizable number of sites. There are a number of good attempts now, but it's not easy to install them and that limits the amount of experience we can gather. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080826/d187370 6/attachment-0001.bin ------------------------------ Message: 6 Date: Tue, 26 Aug 2008 18:13:14 -0500 From: "Dorothea Salo" Subject: Re: [Dspace-general] Week 2: Statistics To: dspace-general at mit.edu Message-ID: <356cf3980808261613n27ea9a5x917b98b833df37dc at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 2008/8/26 Mark H. Wood : > Well, it depends on which numbers are wanted. I do think there will > be some reports that are popular enough, and easy enough to get right, > that they should be built in. The support for external tools would be > aimed at people who do want to use them. What sort of data would be > useful to the manager who isn't into heavy statistical analysis, which > aren't likely to be provided as built-ins? Well, I hope that's where the discussion this week has been pointing. If not, we'll have to find a different way to gather that information. Looking at existing implementations of statistics (e.g. EPrints, SSRN) might be a start. > o the tail should not be allowed to wag the dog -- we want > statistics, but that's subordinate to building excellent document > repository software. Part of, important, but in a supporting role. This is such an interesting statement that I think I will make it next week's topic! What *is* excellent document repository software? I have a feeling that the non-developer community may have a rather different take on it from most developers... we'll see if I'm right. > So I am hoping that we can mostly satisfy most people with relatively > modest built-in statistical support, and take care of the other cases > with modest support for the development of external reporting > mechanisms. I'd be interested to know how the proposals that have been put forward this week place on a modesty scale. Developers? > This is one reason why I think that it should be as easy as possible > for multiple stat. projects to tap into built-in streams of > observations. Different sites have different needs, and I think we > need to be able to easily play with various ways of doing stat.s. Agreed, but just to toss this out: I foresee a countervailing pressure in future toward standardized and aggregated statistics across repositories. I have heard a number of statements to the effect that faculty are using download counts from disciplinary repositories in tenure-and-promotion packages. As their work becomes scattered and/or duplicated across various repositories, they're going to want to aggregate that information. > There are a > number of good attempts now, but it's not easy to install them and > that limits the amount of experience we can gather. +1. This is a problem for more than just statistics! Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 ------------------------------ Message: 7 Date: Wed, 27 Aug 2008 10:37:12 +0200 From: Christophe Dupriez Subject: Re: [Dspace-general] Week 2: Statistics To: Dorothea Salo Cc: dspace Message-ID: <48B51238.4010008 at destin.be> Content-Type: text/plain; charset="iso-8859-1" Hi Dorothea and participants to this discussion! I would like to say that statistics are there for different purposes: 1) detect errors (why nobody looked at my site last sunday?) 2) provide KPI (Key Performance Indicators), measures that a manager follows on the medium term to take organisational decisions 3) investigate new hypothesis before investing to change the organisation. For purpose (3), by essence, you need to "open" to analysis the detailed logs of the events and the data stored in DSpace. Generic programs like SAS or reports generators are the best to dig in data and answer to new, unforeseen questions. Everybody in the community will be happy to have this "back door" available. For purpose (2), we need to know what KPIs are needed by IR managers. I will go further, new IRs and their managers would be very happy not to reinvent KPIs and to have good ones already proposed to sustain a documented IR development process. A very big part of DSpace attractiveness is (and should be implemented really!) that it provides "best practices" for IR management (and not only computing). For purpose (2), Use cases, practices, measures must be designed upfront. It will contribute strongly to the overall specifications of DSpace. For purpose (1), a more formal, bottom up, data driven approach may be sufficient to install validation tools (like the checksum checker) to ensure that DSpace operations are "in line". So we have no choice: we have to listen IR managers (please come by!) to know the good practices DSpace must support... Have a nice day! Christophe (peeking on the list when I should not during my holidays!) Dorothea Salo a ?crit : > Greetings, DSpace community, > > I want to thank everyone once again for last week's stimulating > discussion and impressive chat turnout! I have a new question for > everyone this week, pursuant to some discussion on the lists: > > "Statistics" are one of the commonest requests for a new DSpace > feature. Without further specification, however, it's hard to know > what data to present, since there are no standards or even clear best > practices in this area. What statistics do the following groups of > DSpace users need to see, and in what form are the statistics best > presented to them? > > Depositors > End-users (defined as "people examining items and downloading > bitstreams from a DSpace instance;" we may have to refine this further > in discussion) > DSpace repository managers (as distinct from systems administrators) > > What else should developers keep in mind as they implement this feature? > > Because it would be nice to reach a working consensus on this (unlike > last week's question, which was intended to pull out as broad a > selection of needs as possible), I think we should start discussing > immediately. I encourage all respondents to respond TO THE MAILING > LIST instead of to me. > > I will be holding another chat to discuss the weekly question. It will > take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on > irc.freenode.net. I apologize to West Coast (USA) community members > for last week's unconscionably early hour; we'll try 10 am US Central > (11 am Eastern, 4 pm GMT) this week, and we may go even later next > week if our European community members can stand it. > > For those who don't normally use IRC, there are two easy web gateways. > One is mibbit.com; the other is specific to our channel and can be > found at . I encourage > all of us to become familiar with the channel; it is a source of > real-time technical information from DSpace developers, as well as a > community in its own right. > > Dorothea > > -------------- next part -------------- A non-text attachment was scrubbed... Name: christophe_dupriez.vcf Type: text/x-vcard Size: 454 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080827/3784f64 3/christophe_dupriez.vcf ------------------------------ _______________________________________________ Dspace-general mailing list Dspace-general at mit.edu http://mailman.mit.edu/mailman/listinfo/dspace-general End of Dspace-general Digest, Vol 61, Issue 22 ********************************************** From paul.needham11 at btinternet.com Wed Aug 27 07:38:04 2008 From: paul.needham11 at btinternet.com (Paul Needham) Date: Wed, 27 Aug 2008 12:38:04 +0100 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> Message-ID: <010501c90839$5e9bd9c0$4101a8c0@EISDESKTOP> Hi Dorothea >From my perspective, this week's topic is timely as I've just started work on the JISC-funded PIRUS (Publisher and Institutional Repository Usage Statistics) Project, which runs until the end of this year. The aim of the project is to develop COUNTER-compliant usage reports at the individual article level that can be implemented by any entity (publisher, aggregator, IR, etc.,) that hosts online journal articles and will enable the usage of research outputs to be recorded, reported and consolidated at a global level in a standard way. We have identified the relevant statistics stakeholders as: * STM publishing community * IR managers * Individual researchers * Research library directors * HE/FE research funding agencies * Board of COUNTER We are only in the early stages of our research at the moment, but, by the end of the year, hope to be in a position to propose a format for COUNTER-compliant usage reports, together with supporting protocols, and submit this to COUNTER for approval as a new standard, to be adopted and maintained by COUNTER. Of course, this represents only one part of the wider IR statistics landscape but may be something useful to throw into the mix! Wearing another hat, as someone helping to run Cranfield University's IR (Cranfield CERES), I would echo other comments that have been made on the need for stats on a per-author and per-school/department basis, as well as various 'Top Ten' lists. Regards Paul ____________________________ Paul A S Needham Research & Innovation Specialist Kings Norton Library Cranfield University Cranfield MK43 0AL -----Original Message----- From: dspace-general-bounces at mit.edu [mailto:dspace-general-bounces at mit.edu] On Behalf Of Dorothea Salo Sent: 25 August 2008 14:09 To: dspace; DSpace Tech-List Subject: [Dspace-general] Week 2: Statistics Greetings, DSpace community, I want to thank everyone once again for last week's stimulating discussion and impressive chat turnout! I have a new question for everyone this week, pursuant to some discussion on the lists: "Statistics" are one of the commonest requests for a new DSpace feature. Without further specification, however, it's hard to know what data to present, since there are no standards or even clear best practices in this area. What statistics do the following groups of DSpace users need to see, and in what form are the statistics best presented to them? Depositors End-users (defined as "people examining items and downloading bitstreams from a DSpace instance;" we may have to refine this further in discussion) DSpace repository managers (as distinct from systems administrators) What else should developers keep in mind as they implement this feature? Because it would be nice to reach a working consensus on this (unlike last week's question, which was intended to pull out as broad a selection of needs as possible), I think we should start discussing immediately. I encourage all respondents to respond TO THE MAILING LIST instead of to me. I will be holding another chat to discuss the weekly question. It will take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on irc.freenode.net. I apologize to West Coast (USA) community members for last week's unconscionably early hour; we'll try 10 am US Central (11 am Eastern, 4 pm GMT) this week, and we may go even later next week if our European community members can stand it. For those who don't normally use IRC, there are two easy web gateways. One is mibbit.com; the other is specific to our channel and can be found at . I encourage all of us to become familiar with the channel; it is a source of real-time technical information from DSpace developers, as well as a community in its own right. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 _______________________________________________ Dspace-general mailing list Dspace-general at mit.edu http://mailman.mit.edu/mailman/listinfo/dspace-general From dsalo at library.wisc.edu Wed Aug 27 09:28:47 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Wed, 27 Aug 2008 08:28:47 -0500 Subject: [Dspace-general] Chat in 90 minutes Message-ID: <356cf3980808270628m11c280a6ybe04ad61e1d6915a@mail.gmail.com> Good day everyone, We'll be holding our second DSpace development chat in the #dspace IRC channel on irc.freenode.net approximately 90 minutes from now (10 am Central, 11 Eastern, 4 pm GMT). I will be turning up about half an hour beforehand, after a morning meeting. The topic of the day is statistics! The goal is to reach rough consensus on a baseline set of end-user-facing statistics we believe DSpace should offer. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From mwood at IUPUI.Edu Wed Aug 27 09:46:54 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Wed, 27 Aug 2008 09:46:54 -0400 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> <20080826203433.GB20164@IUPUI.Edu> <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> Message-ID: <20080827134654.GA24195@IUPUI.Edu> On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote: > 2008/8/26 Mark H. Wood : [snip] > This is such an interesting statement that I think I will make it next > week's topic! What *is* excellent document repository software? I have > a feeling that the non-developer community may have a rather different > take on it from most developers... we'll see if I'm right. I think you are, and I look forward to that discussion! > > This is one reason why I think that it should be as easy as possible > > for multiple stat. projects to tap into built-in streams of > > observations. Different sites have different needs, and I think we > > need to be able to easily play with various ways of doing stat.s. > > Agreed, but just to toss this out: I foresee a countervailing pressure > in future toward standardized and aggregated statistics across > repositories. I have heard a number of statements to the effect that > faculty are using download counts from disciplinary repositories in > tenure-and-promotion packages. As their work becomes scattered and/or > duplicated across various repositories, they're going to want to > aggregate that information. Quite so. I just don't feel that we've yet got to the point at which we understand how to do that well. A lot of good solutions come about in this way: an abstract and somewhat indistinct common need is recognized; a number of people all go off in different directions and try things; solutions are compared, borrow from each other, coalesce; finally a now well-understood need finds itself fulfilled with one or two mature implementations. I feel that we're still deep in the "try things" phase. The degree to which statistics are desired and used suggests that, in addition to traditional reports, we should be thinking in terms of exposing statistical products in machine-readable form. I have been thinking for some time that we might, with reasonable effort, help to work out a lingua franca for exchanging usage statistics among repositories of various "brands" so that the utility of various ideas, and the behavior of repository users, might be studied more effectively. But again, what we can all agree on will very likely be a small subset of what we can individually envision. This really ought to be considered early-on, because if we can come up with a common theme in the abstract, then machine- and human-readable reporting become side-by-side layers on top of the pool of statistical data products, and both will be easier to think about if they are merely formatting something already produced. Likewise the production of those stat.s will be easier to think about if presentation issues can be separated from the task. I do *not* mean to say here that the statistics that people want now should have to wait indefinitely on some Grand Scheme to do it all. It would be better to organize the development in successive approximations if it looks like taking too long to do it all in one push. It's probably going to take several years to fully realize satisfactory monitoring and reporting of DSpace usage, but that doesn't mean that we can't provide better and better approximations much sooner. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080827/97f4755c/attachment.bin From sdl at aber.ac.uk Wed Aug 27 10:39:48 2008 From: sdl at aber.ac.uk (Stuart Lewis) Date: Wed, 27 Aug 2008 15:39:48 +0100 Subject: [Dspace-general] The RSP launch 'The DSpace Course' - a suite of DSpace training modules In-Reply-To: Message-ID: [Apologies for cross-posting...] Today the JISC-funded Repositories Support Project (http://rsp.ac.uk/) have formally launched a modular training course for DSpace - "The DSpace Course". The course materials have been published with a Creative Commons licence in order to facilitate their re-use. The course is suitable for DSpace administrators and developers, with the choice of modules being dependent on the people taking the course. The course tutor can mix-and-match the modules to create a custom course. Each module comes with a set of PowerPoint slides, and an associated student workbook. The course has been successfully taught in the UK and Italy. There are 20 modules in the course, with more modules due to be added soon. The modules include: - An Introduction to DSpace - How to Get Help - Repository Structure - Identifiers - DSpace Configuration - User management and authentication options - Metadata Input Customisation - Look and Feel Customisation - Language Customisation - Item Submission Workflows - Import and Export - Configuring LDAP - Upgrading from 1.4. to 1.5 In addition to the course materials the RSP has released a DSpace 'Live CD'. The CD allows any PC to be used as training machine with a copy of DSpace pre-installed, along with all of the files required to perform a new installation. The CD is inserted into a computer upon boot, and will load a live version of the DSpace software without installation to the hard drive. Upon completion of the training course, remove the CD and the normal operating system will be loaded upon restart of the PC. The course materials can be downloaded from: - http://hdl.handle.net/2160/615 The Live CD can be downloaded from: - http://hdl.handle.net/2160/563 The course has been written by Stuart Lewis (DSpace committer, developer and trainer), Chris Yates (DSpace developer, support provider and trainer) and has benefited from input by Claudia J?rgen (DSpace committer, developer and trainer). For help and support, please direct all enquiries related to the course to support at rsp.ac.uk. In addition, the support team may be able to put you in touch with suitable trainers who could teach the course in your area. From randy_stern at harvard.edu Wed Aug 27 13:57:58 2008 From: randy_stern at harvard.edu (Randy Stern) Date: Wed, 27 Aug 2008 13:57:58 -0400 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <20080827134654.GA24195@IUPUI.Edu> References: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> <20080826203433.GB20164@IUPUI.Edu> <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> Message-ID: <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu> One useful distinction is to separate to some degree the statistics that we may want to calculate from the events/raw data that needs to be recorded by the DSpace system as it operates. As long as the events are recorded in the database (preferably *not* logged in files), various computations, aggregations, reports, and APIs for exposing that data can be generated later. So we may want to focus initially on what data to record and plan for a statistics data model, database tables, and recording to be built into DSpace 2.0. At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote: >On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote: > > 2008/8/26 Mark H. Wood : >[snip] > > This is such an interesting statement that I think I will make it next > > week's topic! What *is* excellent document repository software? I have > > a feeling that the non-developer community may have a rather different > > take on it from most developers... we'll see if I'm right. > >I think you are, and I look forward to that discussion! > > > > This is one reason why I think that it should be as easy as possible > > > for multiple stat. projects to tap into built-in streams of > > > observations. Different sites have different needs, and I think we > > > need to be able to easily play with various ways of doing stat.s. > > > > Agreed, but just to toss this out: I foresee a countervailing pressure > > in future toward standardized and aggregated statistics across > > repositories. I have heard a number of statements to the effect that > > faculty are using download counts from disciplinary repositories in > > tenure-and-promotion packages. As their work becomes scattered and/or > > duplicated across various repositories, they're going to want to > > aggregate that information. > >Quite so. I just don't feel that we've yet got to the point at which >we understand how to do that well. A lot of good solutions come about >in this way: an abstract and somewhat indistinct common need is >recognized; a number of people all go off in different directions and >try things; solutions are compared, borrow from each other, coalesce; >finally a now well-understood need finds itself fulfilled with one or >two mature implementations. I feel that we're still deep in the "try >things" phase. > >The degree to which statistics are desired and used suggests that, in >addition to traditional reports, we should be thinking in terms of >exposing statistical products in machine-readable form. I have been >thinking for some time that we might, with reasonable effort, help to >work out a lingua franca for exchanging usage statistics among >repositories of various "brands" so that the utility of various ideas, >and the behavior of repository users, might be studied more >effectively. But again, what we can all agree on will very likely be >a small subset of what we can individually envision. > >This really ought to be considered early-on, because if we can come up >with a common theme in the abstract, then machine- and human-readable >reporting become side-by-side layers on top of the pool of statistical >data products, and both will be easier to think about if they are >merely formatting something already produced. Likewise the production >of those stat.s will be easier to think about if presentation issues >can be separated from the task. > >I do *not* mean to say here that the statistics that people want now >should have to wait indefinitely on some Grand Scheme to do it all. >It would be better to organize the development in successive >approximations if it looks like taking too long to do it all in one >push. It's probably going to take several years to fully realize >satisfactory monitoring and reporting of DSpace usage, but that >doesn't mean that we can't provide better and better approximations >much sooner. > >-- >Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu >Typically when a software vendor says that a product is "intuitive" he >means the exact opposite. > > >_______________________________________________ >Dspace-general mailing list >Dspace-general at mit.edu >http://mailman.mit.edu/mailman/listinfo/dspace-general Randy Stern Manager of Systems Development Harvard University Library Office for Information Systems 90 Mount Auburn Street Cambridge, MA 02138 Tel. +1 (617) 495-3724 Email From peter.kennedy at canterbury.ac.nz Wed Aug 27 16:55:39 2008 From: peter.kennedy at canterbury.ac.nz (Peter Kennedy) Date: Thu, 28 Aug 2008 08:55:39 +1200 Subject: [Dspace-general] Statistics In-Reply-To: <48B51238.4010008@destin.be> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <48B51238.4010008@destin.be> Message-ID: <297620FB8039FE4B9BD97F690F4F306002CB2868@ucexchange4.canterbury.ac.nz> > I would like to say that statistics are there for different purposes: > 1) detect errors (why nobody looked at my site last sunday?) > 2) provide KPI (Key Performance Indicators), measures that a manager > follows on the medium term to take organisational decisions > 3) investigate new hypothesis before investing to change the > organisation. And we can add to that the use of statistics as a marketing tool - in particular to show academic staff how much use is being made of their contributions and, perhaps, also to encourage others to contribute. Regards, Peter Kennedy From peter.kennedy at canterbury.ac.nz Wed Aug 27 16:55:39 2008 From: peter.kennedy at canterbury.ac.nz (Peter Kennedy) Date: Thu, 28 Aug 2008 08:55:39 +1200 Subject: [Dspace-general] Statistics In-Reply-To: <48B51238.4010008@destin.be> References: <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <48B51238.4010008@destin.be> Message-ID: <297620FB8039FE4B9BD97F690F4F306002CB2868@ucexchange4.canterbury.ac.nz> > I would like to say that statistics are there for different purposes: > 1) detect errors (why nobody looked at my site last sunday?) > 2) provide KPI (Key Performance Indicators), measures that a manager > follows on the medium term to take organisational decisions > 3) investigate new hypothesis before investing to change the > organisation. And we can add to that the use of statistics as a marketing tool - in particular to show academic staff how much use is being made of their contributions and, perhaps, also to encourage others to contribute. Regards, Peter Kennedy From scott.yeadon at anu.edu.au Wed Aug 27 18:40:11 2008 From: scott.yeadon at anu.edu.au (Scott Yeadon) Date: Thu, 28 Aug 2008 08:40:11 +1000 Subject: [Dspace-general] Statistics In-Reply-To: References: Message-ID: <0K6A00LNO6YZKG90@messaging1.anu.edu.au> Hi, While jumping ahead a bit and not completely relevant to the context of this discussion, it's important in any solution to separate out event capture and statistics. Web server level statistics will only get you so far. Having recently been through an exercise in building a prototype statistics aggregator, the fundamentals in producing "good" statistics (i.e. the reported information) is the *targetted capture* of events (i.e. the raw event data) typically by the application (i.e. in the DSpace code). We found the majority of reports which people want (or rather the accuracy and granularity thereof) can only be provided where the application has captured the event information rather than the more general-level web container app. If you couple the DSpace 1.5.x event producer/consumer feature with something like the De Minho front-end or a Manakin stats aspect, that would make a pretty neat default stats package. Scott. dspace-general-request at mit.edu wrote: > Message: 1 > Date: Tue, 26 Aug 2008 11:09:15 -0500 > From: Tim Donohue > Subject: Re: [Dspace-general] Week 2: Statistics > To: Dorothea Salo > Cc: dspace-general at mit.edu > Message-ID: <48B42AAB.6010804 at illinois.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dorothea & all, > > Dorothea Salo wrote: > >> 2008/8/25 Mark H. Wood : >> >>> One thing to keep in mind about whole-site statistical tables is that >>> there are already tools to do this for web sites in general, such as >>> AWStats or Webalizer or whatever your favorite may be. We probably >>> should not spend effort to try to duplicate those. >>> >> Perhaps not, but if this is the direction we want people to go in, we >> probably ought to document how to do it, at least informally on the >> wiki. Does anybody have such a system in place? >> > > For IDEALS (www.ideals.uiuc.edu), we use AWStats to get site-wide > traffic information. However, that information is *not* publicly > accessible. We only use it for administrative purposes, since most of > the information AWStats generates for us is generally *not* useful to > our users. > > So, for example, AWStats can provide us with the following general > information: > * Which features of DSpace are being used most frequently (e.g. > Subject Browse, Community/Collection browse, search, etc.) > * Which web browsers our users are using > * # of overall hits in a given month,week,day,hour > * Approximate amount of time users spend on our site > * What external resources people use to get to our site (e.g. Google, > Blog posts, Library website, etc.) > * The top searches used to get to your site (in Google, Yahoo, MSN, etc) > > But, AWStats only works at a global level. So, it *cannot* give us any > real information at a community, collection or item level, since it > doesn't understand DSpace's internal structure and cannot parse DSpace's > log files (it parses the *web server* log files, rather than DSpace's > internal logs) > > So, in the end, AWStats is a worthwhile tool to keep in mind. However, > without some major customizations specific to DSpace, it's really more > of an Administrative tool to help you determine *how* users are using > your site. It doesn't give any real worthwhile "statistics" in terms of > file downloads or individual community/collection access counts, which > are more likely to be useful to your users. > > - Tim > > From mdiggory at MIT.EDU Wed Aug 27 20:08:01 2008 From: mdiggory at MIT.EDU (Mark Diggory) Date: Wed, 27 Aug 2008 17:08:01 -0700 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu> References: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> <20080826203433.GB20164@IUPUI.Edu> <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu> Message-ID: There is a degree of cleanup you would want to do to the incoming data prior to storing it. 1.) host name and geoip resolution on the ip 2.) elimination of bots and automated download tools 3.) elimination of duplicate requests (double-clicks) Likewise, it would be very important to obfuscate/clean the IP data that gets stored to eliminate privacy concerns when governments come knocking at your door. I would recommend a different db instance to store such data that can have a connection pool configured to optimize writing over reading. I would recommend evaluating Reporting Engines and Frameworks to arrive at an optimal database configuration independent of DSpace. We've worked hard on an internal Statistics/Reporting solution for DSpace at MIT that uses the DSpace DB another storage database and processing across Apache Logs. Eventually I'd like to see us move to a usage event driven update process rather than our Apache log trolling. It currently doesn't sport a UI and generates spreadsheet reports. I think its important to separate the DSpace database, the statistics database, the reporting tools and the User interface needs into separate but related projects so that they may evolve and be supported addons to DSpace. Ultimately this means we working in the DSpace Core need to make sure hooks like "Usage Events" get into place and are available for Addons to attach listeners to. What does this mean to the group: 1.) Endorsing and Shepherding those changes to the core code-base into a near future release. 2.) Evaluation of the need for a common notification framework for both Usage and Modification events. 3.) Establishing a Roadmap that is inclusive, allowing new projects and team members to participate within the development/release process. --- As well, I find the following concerns with the Minho statistics addon. A.) Usage of procedural postgreql excludes oracle users and restricts portability and introduces a layer of complexity that requires maintainers to be able to debug within a layer that is not traditionally customized by DSpace. I feel that this needs to be in the java implementation rather than in a storage specific language and execution environment. This is a major factor in our not using the Minho solution at DSpace at MIT. B.) Overlays may be used to deploy on top of JSPUI/XMLUI, but we should work for better plug-ability of this functionality. Specifically, we've seen that the JSPUI's usage of JSP Tag libraries isn't ideal or well designed in DSpace. The usage of Tag libraries should ideally be replaced with Beans/Collections and JSTL iterator tags. THe JSPUI should be looking at templates and portlets for solutions to allow plugability rather than direct customization of JSP's, Taglibraries and Servlets by the community. Tangent: This is why the XMLUI was created, to get away from this bad design. C.) I commend the usage of a separate SQL namespace, but suggest further that it might be better to be a completely separate DB allowing optimized write connections independent of the dspace db, whose connections are better optimized for reading and transactional security. D.) The Usage of the JDBC Log4j appender, while creative, introduces another layer of complexity that isn't explicit. A Plugable UsageEvent API may better manage the generation of events in the UI to be directed to the Statistics addon. This may be of lesser concern because it could be adapted to work as a UsageEvent consumer, rather than consuming Logging events destined for dspace.log directly. These comments are meant to be constructive, speaking for the community, I think don't want to see this work fall to the "wayside" and work to eliminate barriers to its update into the community. I highly promote that those working on projects within the community (such as the Minho statistics addon) take advantage of the tools and services we are maintaining to enable your work in an open environment where you can seek support and advice directly from the community of DSpace developers. We are working on a new Contribution WIKI page section to outline these Services and the policies and procedures around working with them. http://wiki.dspace.org/index.php/ DSpaceResources#DSpace_Community_Sandbox -Mark On Aug 27, 2008, at 10:57 AM, Randy Stern wrote: > One useful distinction is to separate to some degree the statistics > that we > may want to calculate from the events/raw data that needs to be > recorded by > the DSpace system as it operates. As long as the events are > recorded in the > database (preferably *not* logged in files), various computations, > aggregations, reports, and APIs for exposing that data can be > generated > later. So we may want to focus initially on what data to record and > plan > for a statistics data model, database tables, and recording to be > built > into DSpace 2.0. > > At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote: >> On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote: >>> 2008/8/26 Mark H. Wood : >> [snip] >>> This is such an interesting statement that I think I will make it >>> next >>> week's topic! What *is* excellent document repository software? I >>> have >>> a feeling that the non-developer community may have a rather >>> different >>> take on it from most developers... we'll see if I'm right. >> >> I think you are, and I look forward to that discussion! >> >>>> This is one reason why I think that it should be as easy as >>>> possible >>>> for multiple stat. projects to tap into built-in streams of >>>> observations. Different sites have different needs, and I think we >>>> need to be able to easily play with various ways of doing stat.s. >>> >>> Agreed, but just to toss this out: I foresee a countervailing >>> pressure >>> in future toward standardized and aggregated statistics across >>> repositories. I have heard a number of statements to the effect that >>> faculty are using download counts from disciplinary repositories in >>> tenure-and-promotion packages. As their work becomes scattered >>> and/or >>> duplicated across various repositories, they're going to want to >>> aggregate that information. >> >> Quite so. I just don't feel that we've yet got to the point at which >> we understand how to do that well. A lot of good solutions come >> about >> in this way: an abstract and somewhat indistinct common need is >> recognized; a number of people all go off in different directions and >> try things; solutions are compared, borrow from each other, coalesce; >> finally a now well-understood need finds itself fulfilled with one or >> two mature implementations. I feel that we're still deep in the "try >> things" phase. >> >> The degree to which statistics are desired and used suggests that, in >> addition to traditional reports, we should be thinking in terms of >> exposing statistical products in machine-readable form. I have been >> thinking for some time that we might, with reasonable effort, help to >> work out a lingua franca for exchanging usage statistics among >> repositories of various "brands" so that the utility of various >> ideas, >> and the behavior of repository users, might be studied more >> effectively. But again, what we can all agree on will very likely be >> a small subset of what we can individually envision. >> >> This really ought to be considered early-on, because if we can >> come up >> with a common theme in the abstract, then machine- and human-readable >> reporting become side-by-side layers on top of the pool of >> statistical >> data products, and both will be easier to think about if they are >> merely formatting something already produced. Likewise the >> production >> of those stat.s will be easier to think about if presentation issues >> can be separated from the task. >> >> I do *not* mean to say here that the statistics that people want now >> should have to wait indefinitely on some Grand Scheme to do it all. >> It would be better to organize the development in successive >> approximations if it looks like taking too long to do it all in one >> push. It's probably going to take several years to fully realize >> satisfactory monitoring and reporting of DSpace usage, but that >> doesn't mean that we can't provide better and better approximations >> much sooner. >> >> -- >> Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu >> Typically when a software vendor says that a product is >> "intuitive" he >> means the exact opposite. >> >> >> _______________________________________________ >> Dspace-general mailing list >> Dspace-general at mit.edu >> http://mailman.mit.edu/mailman/listinfo/dspace-general > > > Randy Stern > Manager of Systems Development > Harvard University Library Office for Information Systems > 90 Mount Auburn Street > Cambridge, MA 02138 > Tel. +1 (617) 495-3724 > Email > > > _______________________________________________ > Dspace-general mailing list > Dspace-general at mit.edu > http://mailman.mit.edu/mailman/listinfo/dspace-general From hussein at cs.uct.ac.za Thu Aug 28 04:39:21 2008 From: hussein at cs.uct.ac.za (Hussein Suleman) Date: Thu, 28 Aug 2008 10:39:21 +0200 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu> References: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> <20080826203433.GB20164@IUPUI.Edu> <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu> Message-ID: <48B66439.2010008@cs.uct.ac.za> without getting into whether event streams should be logged to file or database, this is probably in general the way to go. though i would recommend that this is done on a broader scale so analysis tools are interoperable among the major repository software systems. (there was some research on an XML log file format a while back but it did not go far) ttfn, ----hussein ===================================================================== hussein suleman ~ hussein at cs.uct.ac.za ~ http://www.husseinsspace.com ===================================================================== Randy Stern wrote: > One useful distinction is to separate to some degree the statistics that we > may want to calculate from the events/raw data that needs to be recorded by > the DSpace system as it operates. As long as the events are recorded in the > database (preferably *not* logged in files), various computations, > aggregations, reports, and APIs for exposing that data can be generated > later. So we may want to focus initially on what data to record and plan > for a statistics data model, database tables, and recording to be built > into DSpace 2.0. > > At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote: >> On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote: >>> 2008/8/26 Mark H. Wood : >> [snip] >>> This is such an interesting statement that I think I will make it next >>> week's topic! What *is* excellent document repository software? I have >>> a feeling that the non-developer community may have a rather different >>> take on it from most developers... we'll see if I'm right. >> I think you are, and I look forward to that discussion! >> >>>> This is one reason why I think that it should be as easy as possible >>>> for multiple stat. projects to tap into built-in streams of >>>> observations. Different sites have different needs, and I think we >>>> need to be able to easily play with various ways of doing stat.s. >>> Agreed, but just to toss this out: I foresee a countervailing pressure >>> in future toward standardized and aggregated statistics across >>> repositories. I have heard a number of statements to the effect that >>> faculty are using download counts from disciplinary repositories in >>> tenure-and-promotion packages. As their work becomes scattered and/or >>> duplicated across various repositories, they're going to want to >>> aggregate that information. >> Quite so. I just don't feel that we've yet got to the point at which >> we understand how to do that well. A lot of good solutions come about >> in this way: an abstract and somewhat indistinct common need is >> recognized; a number of people all go off in different directions and >> try things; solutions are compared, borrow from each other, coalesce; >> finally a now well-understood need finds itself fulfilled with one or >> two mature implementations. I feel that we're still deep in the "try >> things" phase. >> >> The degree to which statistics are desired and used suggests that, in >> addition to traditional reports, we should be thinking in terms of >> exposing statistical products in machine-readable form. I have been >> thinking for some time that we might, with reasonable effort, help to >> work out a lingua franca for exchanging usage statistics among >> repositories of various "brands" so that the utility of various ideas, >> and the behavior of repository users, might be studied more >> effectively. But again, what we can all agree on will very likely be >> a small subset of what we can individually envision. >> >> This really ought to be considered early-on, because if we can come up >> with a common theme in the abstract, then machine- and human-readable >> reporting become side-by-side layers on top of the pool of statistical >> data products, and both will be easier to think about if they are >> merely formatting something already produced. Likewise the production >> of those stat.s will be easier to think about if presentation issues >> can be separated from the task. >> >> I do *not* mean to say here that the statistics that people want now >> should have to wait indefinitely on some Grand Scheme to do it all. >> It would be better to organize the development in successive >> approximations if it looks like taking too long to do it all in one >> push. It's probably going to take several years to fully realize >> satisfactory monitoring and reporting of DSpace usage, but that >> doesn't mean that we can't provide better and better approximations >> much sooner. >> >> -- >> Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu >> Typically when a software vendor says that a product is "intuitive" he >> means the exact opposite. >> >> >> _______________________________________________ >> Dspace-general mailing list >> Dspace-general at mit.edu >> http://mailman.mit.edu/mailman/listinfo/dspace-general > > > Randy Stern > Manager of Systems Development > Harvard University Library Office for Information Systems > 90 Mount Auburn Street > Cambridge, MA 02138 > Tel. +1 (617) 495-3724 > Email > > > _______________________________________________ > Dspace-general mailing list > Dspace-general at mit.edu > http://mailman.mit.edu/mailman/listinfo/dspace-general From mwood at IUPUI.Edu Thu Aug 28 09:04:54 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Thu, 28 Aug 2008 09:04:54 -0400 Subject: [Dspace-general] Statistics In-Reply-To: <0K6A00LNO6YZKG90@messaging1.anu.edu.au> References: <0K6A00LNO6YZKG90@messaging1.anu.edu.au> Message-ID: <20080828130454.GA11845@IUPUI.Edu> On Thu, Aug 28, 2008 at 08:40:11AM +1000, Scott Yeadon wrote: > While jumping ahead a bit and not completely relevant to the context of > this discussion, it's important in any solution to separate out event > capture and statistics. Web server level statistics will only get you so > far. Having recently been through an exercise in building a prototype > statistics aggregator, the fundamentals in producing "good" statistics > (i.e. the reported information) is the *targetted capture* of events > (i.e. the raw event data) typically by the application (i.e. in the > DSpace code). We found the majority of reports which people want (or > rather the accuracy and granularity thereof) can only be provided where > the application has captured the event information rather than the more > general-level web container app. If you couple the DSpace 1.5.x event > producer/consumer feature with something like the De Minho front-end or > a Manakin stats aspect, that would make a pretty neat default stats package. I agree. :-) A start on that: http://sourceforge.net/tracker/index.php?func=detail&aid=2025998&group_id=19984&atid=319984 The Event System seems focused on changes to the repository, and I recall that there was some resistance to expanding it to cover references that don't change the model. The above is a separate event mechanism focused on reference events. I've made considerable progress on adapting the University of Rochester statistics package to take cases from this UsageEvent stream instead of custom patching, and an XMLUI Aspect to make the resulting per-object stat.s available for theming, but it's not quite ready for daylight yet. It's my understanding that the Minho package is one of those which take cases from periodic digestion of log files. Once an event stream is available, it should be simple to create an adapter which appends event records to a file in a suitable format, without clutter and with the data you need. The above patch demonstrates this with a plugin which appends to a simple XML-like file. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080828/9edd1894/attachment.bin From mwood at IUPUI.Edu Thu Aug 28 09:16:06 2008 From: mwood at IUPUI.Edu (Mark H. Wood) Date: Thu, 28 Aug 2008 09:16:06 -0400 Subject: [Dspace-general] Week 2: Statistics In-Reply-To: <48B66439.2010008@cs.uct.ac.za> References: <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> <356cf3980808250608t689c84d8uc7d7f69155a76ece@mail.gmail.com> <356cf3980808250707n5d45ec1vbd607ddcac148e27@mail.gmail.com> <20080825145520.GF15124@IUPUI.Edu> <356cf3980808260744n4d0d8899oad008fec4363c5c4@mail.gmail.com> <20080826203433.GB20164@IUPUI.Edu> <356cf3980808261613n27ea9a5x917b98b833df37dc@mail.gmail.com> <5.2.1.1.2.20080827134948.03e3c380@hulmail.harvard.edu> <48B66439.2010008@cs.uct.ac.za> Message-ID: <20080828131606.GB11845@IUPUI.Edu> On Thu, Aug 28, 2008 at 10:39:21AM +0200, Hussein Suleman wrote: > without getting into whether event streams should be logged to file or > database, this is probably in general the way to go. though i would > recommend that this is done on a broader scale so analysis tools are > interoperable among the major repository software systems. > > (there was some research on an XML log file format a while back but it > did not go far) One difficulty with logging to XML is that, strictly speaking, it's not possible. The document element cannot reliably be closed by the logging application. Practically, it should be simple to close the document element after log cutoff by just pasting the closing tag onto the end of the file before ingestion, but it's a minor weakness of the XML approach. I agree that flat-file representations of usage event data should be designed for general usability by a variety of tools. -- Mark H. Wood, Lead System Programmer mwood at IUPUI.Edu Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080828/46b3dec2/attachment.bin From dsalo at library.wisc.edu Thu Aug 28 13:06:06 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Thu, 28 Aug 2008 12:06:06 -0500 Subject: [Dspace-general] Chat summary: 27 August 2008 Message-ID: <356cf3980808281006m22ebf18ch53ac9ec90a09252a@mail.gmail.com> We had some new voices this time! Good to see. LINKS AND DEMOS * ePrints item report sample: * Minho report sample: * IRStats sample: * AWStats over an entire DSpace repository: (thanks to Ina Smith of the University of Pretoria, who could not be present on the chat, for emailing this link) * "Top downloads" and all-repository statistics on the home page: APPLAUSE The chat took a moment to applaud the Repository Support Project's new DSpace Course (). Contributors Stuart Lewis, Chris Yates, and Claudia Jurgen were all present on the chat. The Course is looking for new contributors, particularly with regard to Manakin/XMLUI; if you can help, please contact Stuart Lewis. AN IMPORTANT DISTINCTION Mark Wood pointed out (as have several emails to the list during this week's discussion) that two sharply differing concepts lurk behind the word "statistics": the capture of repository events as they occur, and the distillation of raw event data into useful reports. "Statistics pull patterns out of collections of individual cases," said Mark. Moreover, not all reports are statistical in nature; some (such as "what's been deposited recently" lists) just regurgitate part of the event stream. Given accessible event-stream data, many statistical analyses can be done wholly outside of DSpace, and it is unrealistic to expect DSpace to create analyses for every imaginable use-case. Some common use-cases, however, may need to become part of DSpace proper; the trouble is defining them. COMMON REPORTING NEEDS All access-related reports (accesses/downloads) should filter out as many crawlers as feasible. * item accesses, total as well as by month and year * bitstream downloads, as above * accesses and downloads by author, as above; authors also want to know what their most popular items are * incoming links from other websites (via referrers; note that referrer spam may become a problem) Other possibilities mentioned included: * alerts for download "spikes" over a short period of time * on item pages, time of last download * "popular items in this repository" (recent, total, and monthly, though it was noted that displaying this information to end-users tends to feed unjust power-law distribution of downloads) Geolocating accesses was not perceived as vital. PRIVACY ISSUES Claudia Jurgen noted that the EU has very strict privacy laws that may prevent collecting or retention of information that may identify individual persons. DSpace may therefore not be able to track individuals' site behavior (to put toward "more like this" links or the like). OTHER DESIDERATA Technical issues: The widely-praised Minho stats engine does not yet work with XMLUI, and no one on the chat knew of plans to adapt it. Mark Diggory noted that event-capture should be separated from log4j's error capturing. Shane Beers pointed out that DSpace does not currently offer repository managers much information about the contents of their repositories, which is a significant worry vis-a-vis bitstream preservation. A list of bitstreams by MIME type would be a start. DSpace also does not help managers investigate deposit patterns and growth. A readily-accessible list of recent deposits as well as a list of deposits per time period (separable by community/collection, so that different communities can be usefully compared) would be useful to repository administrators, and should be relatively easy to build via dc.date.available (or for research-tracking use-cases, dc.date.published) metadata. Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From cwbailey at digital-scholarship.com Fri Aug 29 11:21:33 2008 From: cwbailey at digital-scholarship.com (Charles W. Bailey, Jr.) Date: Fri, 29 Aug 2008 10:21:33 -0500 Subject: [Dspace-general] How Many University/Health Library Institutional Repositories Are There in Texas? Message-ID: <48B813FD.3080402@digital-scholarship.com> Texas is the second largest state in the U.S., both in terms of population (about 23.9 million) and square miles (268,820 square miles). It has 74 universities and 10 health-related academic institutions. How many library institutional repositories serve these universities and health-related academic institutions? Based on major repository directories and key vendor lists, it appears that the answer is eight, with Digital Commons and DSpace being the software of choice. (Institutional repositories being those repositories that serve one or more entire institutions.) Texas also has the Texas Digital Library. Read more about it at "Institutional Repositories at Texas University and Health Science Libraries": http://tinyurl.com/6fanuq -- Best Regards, Charles Charles W. Bailey, Jr. Publisher, Digital Scholarship http://www.digital-scholarship.org/ A Look Back at Nineteen Years as an Internet Digital Publisher http://www.digital-scholarship.org/cwb/nineteenyears.htm From mcgeetho at shu.edu Fri Aug 29 13:01:28 2008 From: mcgeetho at shu.edu (Thomas A McGee) Date: Fri, 29 Aug 2008 13:01:28 -0400 Subject: [Dspace-general] Statistics In-Reply-To: Message-ID: I missed the chat the other day, so some of this may have been covered and dismissed already. Tomcat has the capacity to output Apache-style "combined" log files for all requests, including bitstreams. There's a whole host of commercial, shareware and freeware products out there designed to slice-and-dice these Apache log files and pull out all the kinds of reports people seem to be talking about here. The programs range from the very simple, like Analog, to the extremely complex and expensive, like WebTrends Enterprise. They can be configured to download the log files automatically and run reports on a schedule, so that they're there when you come in in the morning. They can incorporate various filters, resolve user IP addresses, analyze request URL paths (which can be translated into collection and community names), referers, logged-in users, user agents, etc. etc. Rather than reinvent the wheel (and this is an extremely complex wheel),I think for most users it would pay to look at this approach unless there is something really esoteric about your traffic that you are trying to get at. _____________________ Tom McGee Seton Hall University TLTC 973 761 9000 x5021 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080829/66b91904/attachment.htm From mdiggory at MIT.EDU Fri Aug 29 13:32:33 2008 From: mdiggory at MIT.EDU (Mark Diggory) Date: Fri, 29 Aug 2008 10:32:33 -0700 Subject: [Dspace-general] Statistics In-Reply-To: References: Message-ID: Thomas, Thanks for what is also a sensible recommendation. On Aug 29, 2008, at 10:01 AM, Thomas A McGee wrote: > > I missed the chat the other day, so some of this may have been > covered and dismissed already. > > Tomcat has the capacity to output Apache-style "combined" log files > for all requests, including bitstreams. There's a whole host of > commercial, shareware and freeware products out there designed to > slice-and-dice these Apache log files and pull out all the kinds of > reports people seem to be talking about here. > > The programs range from the very simple, like Analog, to the > extremely complex and expensive, like WebTrends Enterprise. They > can be configured to download the log files automatically and run > reports on a schedule, so that they're there when you come in in > the morning. They can incorporate various filters, resolve user IP > addresses, analyze request URL paths (which can be translated into > collection and community names), referers, logged-in users, user > agents, etc. etc. > > Rather than reinvent the wheel (and this is an extremely complex > wheel),I think for most users it would pay to look at this approach > unless there is something really esoteric about your traffic that > you are trying to get at. Its an inherent issue in the the "address space" of DSpace resources made available in the web-application. For instance. I may have the following Community, Collection and Item Computer Science and Artificial Intelligence Lab (CSAIL) http://dspace.mit.edu/handle/1721.1/5458 CSAIL Technical Reports (July 1, 2003 - present) http://dspace.mit.edu/handle/1721.1/29807 Adaptive Envelope MDPs for Relational Equivalence-based Planning http://dspace.mit.edu/handle/1721.1/41920 Via the perception of the Apache/Tomcat logs Requests to these resources are made and based on those logs its quite difficult to ascertain that there is a hierarchy here: /1721.1/5458 <-- Community /1721.1/29807 <-- Collection /1721.1/41920 <-- Item The challenge is that most logging packages given the lack of the above structure being absent in the path of the resource, cannot roll up the statistics to represent the aggregations at the collection and item level that Managers want to see for a DSpace Community/Collection. Likewise, we are in a situation where we are trying to maintain 1.) Not introducing a ridged expectation that "paths" for which resources are represented can not change over time as dspace evolves 2.) That we may have more than one path for which a resource is accessed, and may want to either treat those accesses as "the same" or treat them as "uniquely different" statistically. 3.) That we want to allow hooks so that these stats can be collected off the "logical event" in DSpace rather than the "physical event" in the application server. By configuring a stats solution like analog/awstats/webtrends, we are restricted to only gathering statistics about the physical event of requesting that address in the web service. And likewise, if that address representing that resource changes in UI (either via development decisions or administrative decisions) then that configuration of that external software will be out of sync and need to be adjusted. By having the application report "logical events" we can step away from this issue. By internalizing the statistics gathering and generation, we have an opportunity to create a solution that can allow DSpace to freely evolve and solution that will meet the requirements requested by the community (or more explicitly, exhibited by the Minho addon). Cheers, Mark ~~~~~~~~~~~~~ Mark R. Diggory - DSpace Developer and Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology Home Page: http://purl.org/net/mdiggory/homepage -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/pipermail/dspace-general/attachments/20080829/7cf2a0ea/attachment.htm From gsk5 at cornell.edu Fri Aug 29 15:31:20 2008 From: gsk5 at cornell.edu (George Stanley Kozak) Date: Fri, 29 Aug 2008 15:31:20 -0400 (EDT) Subject: [Dspace-general] Statistics In-Reply-To: References: Message-ID: <3143.67.241.39.12.1220038280.squirrel@webmail.cornell.edu> Tom: I missed the chat, too, but I have been one of the ones asking for a more integrated statistics package. I used to use NetTracker for my DSpace instance and now am using a combination of the Edinburgh software and some locally grown software. I then have a page that lists the top 10 hits and other stats. At one time we hade a counter on the item page that kept track of views, but this local code of ours broke when we went to DSpace 1.4.2. My users are asking for specific information concerning their partcilular item(s) or collection(s) and they'd like to see it on the item or collection page. I tried to use the Minho package, but have had problems getting it to work in my instance. So, my thinking has been that if DSpace had an integrated package (maybe something that acts like the Minho software), then I would be able to give the users what they want. So, in my case, the free and commerical packages while giving me useful information, doesn't give my users what they want to see. To fix that I would have to do some programming and my experience in changing DSpace software these past several years is that "I really don't want to do that!" ;-) So, that's my logic behind asking for an integrated stats package for DSpace (Yes, I know it's selfish!). > I missed the chat the other day, so some of this may have been covered and > dismissed already. > > Tomcat has the capacity to output Apache-style "combined" log files for > all requests, including bitstreams. There's a whole host of commercial, > shareware and freeware products out there designed to slice-and-dice these > Apache log files and pull out all the kinds of reports people seem to be > talking about here. > > The programs range from the very simple, like Analog, to the extremely > complex and expensive, like WebTrends Enterprise. They can be configured > to download the log files automatically and run reports on a schedule, so > that they're there when you come in in the morning. They can incorporate > various filters, resolve user IP addresses, analyze request URL paths > (which can be translated into collection and community names), referers, > logged-in users, user agents, etc. etc. > > Rather than reinvent the wheel (and this is an extremely complex wheel),I > think for most users it would pay to look at this approach unless there is > something really esoteric about your traffic that you are trying to get > at. > > _____________________ > Tom McGee > Seton Hall University TLTC > 973 761 9000 x5021_______________________________________________ > Dspace-general mailing list > Dspace-general at mit.edu > http://mailman.mit.edu/mailman/listinfo/dspace-general > **************************************** George Kozak Coordinator Web Development and Management Digital Media Group 501 Olin Library Cornell University 14853 gsk5 at cornell.edu 607-255-8924 From dsalo at library.wisc.edu Fri Aug 29 15:35:37 2008 From: dsalo at library.wisc.edu (Dorothea Salo) Date: Fri, 29 Aug 2008 14:35:37 -0500 Subject: [Dspace-general] Statistics In-Reply-To: <3143.67.241.39.12.1220038280.squirrel@webmail.cornell.edu> References: <3143.67.241.39.12.1220038280.squirrel@webmail.cornell.edu> Message-ID: <356cf3980808291235k2d4e595fv1d91ffb296227cfb@mail.gmail.com> On Fri, Aug 29, 2008 at 2:31 PM, George Stanley Kozak wrote: > Tom: > So, that's my logic behind asking for an integrated stats package for > DSpace (Yes, I know it's selfish!). What is selfish about doing your best to give the people you are serving what they are asking your service to provide? All right, aside from actually being able to keep your job, and all... ;) Dorothea -- Dorothea Salo dsalo at library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 From sil.linguist at gmail.com Fri Aug 29 17:25:59 2008 From: sil.linguist at gmail.com (Hugh Paterson III) Date: Fri, 29 Aug 2008 16:25:59 -0500 Subject: [Dspace-general] Dspace Mysql Message-ID: <967E9449-58DB-446C-8E82-60B329DB93D8@gmail.com> I am new to d-space and was wondering if anyone has implemented dspace with a MySQL back end? I was looking over the documentation to see if there were any suggestions but there appears to be no official recommendation for MySQL. any help out there?