[OWW-Discuss] Proposal to 'Wikify' GenBank Meets Stiff Resistance

Dan Bolser dan.bolser at gmail.com
Tue Mar 25 09:53:36 EDT 2008


On 25/03/2008, Alexander Wait Zaranek <await at genetics.med.harvard.edu>
wrote:
>
> On Tue, Mar 25, 2008 at 6:54 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> > In summary, I think it is 'doable', but it isn't a case of simply
> mirroring
> > into a revisioning system. We need to work on some protocols for
> > synchronization.
> >
>
> Efficient merging (not just forking) is an active area of version
> control systems research.  Eg.:
>
> * http://git.or.cz/
> * http://bazaar-vcs.org/
> * http://darcs.net/
>
> If you'd enjoy seeing Linus Torvalds talk to Google about why he hates
> subversion--Google employs a few of the key subversion authors--I
> found his talk amusing, if not entirely illuminating:
> http://www.youtube.com/watch?v=4XpnKHJAok8
>
> Just as a test, I'm making a mirror of ftp://ftp.wwpdb.org/pub/pdb/
> and I'll put the results in our content addressable storage system
> where the data is striped across our cluster and available for batch
> processing.
>
>
> "c) allow users to freely edit the data, including automatic clean up
> 'bots', algorithms, etc., etc.
> d) have all changes automatically emailed to a mailing list for
> community review, approval etc."
>
>
> Can you elaborate on this?


d) I have seen software projects with 'svn-mailing-lists', which email
everyone on the list with any / all changes that are committed into the SVN.
In this way the developers can see all relevant changes to the software as
they happen, and can then go remove / improve (/ comment on?) those changes.
The automatic email prompts discussion of the given 'commit'. I think when
it comes to a project the size of genbank, we would need to think about how
to define sub-communities, such as fungal vs. primate people.

c) Sometimes people don't want to fix just one little problem with one
specific entry - they want to change the data on all entries. For example, I
may want to update all the UniProt codes for all the PDB entries, and I may
write a script to do that every week. To do that I would write a script to
apply my newly collected data to the whole archive. Subsequently people
would need to be given the opportunity to accept / reject my algorithm
'en-masse'. Here I start to get bogged down with general 'hetrogeneous data
integration' issues... Perhaps sites like freebase.com can help us here.




Sasha
>
> PS. Speaking of the subversion authors, I also really enjoyed the
> subversion authors talk on making a "successful free software
> project":  http://www.youtube.com/watch?v=ZSFDm3UYkeE   They give a
> pretty firm warning that a successful project should have a very
> narrow focus or risk being unsuccessful in everything.
>



-- 
hello
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/oww-discuss/attachments/20080325/fc738e39/attachment.htm


More information about the Oww-discuss mailing list