<br><br><div><span class="gmail_quote">On 25/03/2008, <b class="gmail_sendername">Alexander Wait Zaranek</b> <<a href="mailto:await@genetics.med.harvard.edu">await@genetics.med.harvard.edu</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Tue, Mar 25, 2008 at 6:54 AM, Dan Bolser <<a href="mailto:dan.bolser@gmail.com">dan.bolser@gmail.com</a>> wrote:<br> > In summary, I think it is 'doable', but it isn't a case of simply mirroring<br>
> into a revisioning system. We need to work on some protocols for<br> > synchronization.<br> ><br> <br>Efficient merging (not just forking) is an active area of version<br> control systems research. Eg.:<br> <br>
* <a href="http://git.or.cz/">http://git.or.cz/</a><br> * <a href="http://bazaar-vcs.org/">http://bazaar-vcs.org/</a><br> * <a href="http://darcs.net/">http://darcs.net/</a><br> <br> If you'd enjoy seeing Linus Torvalds talk to Google about why he hates<br>
subversion--Google employs a few of the key subversion authors--I<br> found his talk amusing, if not entirely illuminating:<br> <a href="http://www.youtube.com/watch?v=4XpnKHJAok8">http://www.youtube.com/watch?v=4XpnKHJAok8</a><br>
<br> Just as a test, I'm making a mirror of <a href="ftp://ftp.wwpdb.org/pub/pdb/">ftp://ftp.wwpdb.org/pub/pdb/</a><br> and I'll put the results in our content addressable storage system<br> where the data is striped across our cluster and available for batch<br>
processing.<br> <br><br> "c) allow users to freely edit the data, including automatic clean up<br> 'bots', algorithms, etc., etc.<br> d) have all changes automatically emailed to a mailing list for<br> community review, approval etc."<br>
<br> <br>Can you elaborate on this?</blockquote><div><br>d) I have seen software projects with 'svn-mailing-lists', which email everyone on the list with any / all changes that are committed into the SVN. In this way the developers can see all relevant changes to the software as they happen, and can then go remove / improve (/ comment on?) those changes. The automatic email prompts discussion of the given 'commit'. I think when it comes to a project the size of genbank, we would need to think about how to define sub-communities, such as fungal vs. primate people. <br>
<br>c) Sometimes people don't want to fix just one little problem with one specific entry - they want to change the data on all entries. For example, I may want to update all the UniProt codes for all the PDB entries, and I may write a script to do that every week. To do that I would write a script to apply my newly collected data to the whole archive. Subsequently people would need to be given the opportunity to accept / reject my algorithm 'en-masse'. Here I start to get bogged down with general 'hetrogeneous data integration' issues... Perhaps sites like <span class="q"> <a href="http://freebase.com/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">freebase.com</a> can help us here.</span><br>
<br><br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Sasha<br> <br> PS. Speaking of the subversion authors, I also really enjoyed the<br>
subversion authors talk on making a "successful free software<br> project": <a href="http://www.youtube.com/watch?v=ZSFDm3UYkeE">http://www.youtube.com/watch?v=ZSFDm3UYkeE</a> They give a<br> pretty firm warning that a successful project should have a very<br>
narrow focus or risk being unsuccessful in everything.<br> </blockquote></div><br><br clear="all"><br>-- <br>hello