<br><br><div><span class="gmail_quote">On 25/03/2008, <b class="gmail_sendername">Alexander Wait Zaranek</b> &lt;<a href="mailto:await@genetics.med.harvard.edu">await@genetics.med.harvard.edu</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On Tue, Mar 25, 2008 at 6:54 AM, Dan Bolser &lt;<a href="mailto:dan.bolser@gmail.com">dan.bolser@gmail.com</a>&gt; wrote:<br> &gt; In summary, I think it is &#39;doable&#39;, but it isn&#39;t a case of simply mirroring<br>

 &gt; into a revisioning system. We need to work on some protocols for<br> &gt; synchronization.<br> &gt;<br> <br>Efficient merging (not just forking) is an active area of version<br> control systems research.&nbsp;&nbsp;Eg.:<br> <br>

 * <a href="http://git.or.cz/">http://git.or.cz/</a><br> * <a href="http://bazaar-vcs.org/">http://bazaar-vcs.org/</a><br> * <a href="http://darcs.net/">http://darcs.net/</a><br> <br> If you&#39;d enjoy seeing Linus Torvalds talk to Google about why he hates<br>

 subversion--Google employs a few of the key subversion authors--I<br> found his talk amusing, if not entirely illuminating:<br> <a href="http://www.youtube.com/watch?v=4XpnKHJAok8">http://www.youtube.com/watch?v=4XpnKHJAok8</a><br>

 <br> Just as a test, I&#39;m making a mirror of <a href="ftp://ftp.wwpdb.org/pub/pdb/">ftp://ftp.wwpdb.org/pub/pdb/</a><br> and I&#39;ll put the results in our content addressable storage system<br> where the data is striped across our cluster and available for batch<br>

 processing.<br> <br><br> &quot;c) allow users to freely edit the data, including automatic clean up<br> &#39;bots&#39;, algorithms, etc., etc.<br> d) have all changes automatically emailed to a mailing list for<br> community review, approval etc.&quot;<br>

 <br> <br>Can you elaborate on this?</blockquote><div><br>d) I have seen software projects with &#39;svn-mailing-lists&#39;, which email everyone on the list with any / all changes that are committed into the SVN. In this way the developers can see all relevant changes to the software as they happen, and can then go remove / improve (/ comment on?) those changes. The automatic email prompts discussion of the given &#39;commit&#39;. I think when it comes to a project the size of genbank, we would need to think about how to define sub-communities, such as fungal vs. primate people. <br>

<br>c) Sometimes people don&#39;t want to fix just one little problem with one specific entry - they want to change the data on all entries. For example, I may want to update all the UniProt codes for all the PDB entries, and I may write a script to do that every week. To do that I would write a script to apply my newly collected data to the whole archive. Subsequently people would need to be given the opportunity to accept / reject my algorithm &#39;en-masse&#39;. Here I start to get bogged down with general &#39;hetrogeneous data integration&#39; issues... Perhaps sites like <span class="q"> <a href="http://freebase.com/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">freebase.com</a> can help us here.</span><br>

<br><br>&nbsp;</div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Sasha<br> <br> PS. Speaking of the subversion authors, I also really enjoyed the<br>

 subversion authors talk on making a &quot;successful free software<br> project&quot;:&nbsp;&nbsp;<a href="http://www.youtube.com/watch?v=ZSFDm3UYkeE">http://www.youtube.com/watch?v=ZSFDm3UYkeE</a>&nbsp;&nbsp; They give a<br> pretty firm warning that a successful project should have a very<br>

 narrow focus or risk being unsuccessful in everything.<br> </blockquote></div><br><br clear="all"><br>-- <br>hello