<br><br><div><span class="gmail_quote">On 24/03/2008, <b class="gmail_sendername">Mackenzie Cowell</b> &lt;<a href="mailto:macowell@gmail.com">macowell@gmail.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Why doesn&#39;t someone start mirroring ncbi and layering a lightweight revisioning layer on top of the content?&nbsp; It could live at <a href="http://ncbi2.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">ncbi2.org</a>.&nbsp; Or it could be mirrored directly into <a href="http://freebase.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">freebase.com</a>, which already provides those tools.</blockquote>

<div><br>The difficulty comes in &#39;merging&#39; back with the original data. This requires some thinking about, but is generally do-able (I believe).<br><br>The conceptually cleanest way to think about the problem is in terms of a versioning (revisioning) system such as SVN. (Sorry, that is what you just said - I should read more carefully!)<br>

<br>Lets say we do the following, <br><br>a) grab all the data from the PDB (for example)<br>b) stick all the data into a revisioning system<br>c) allow users to freely edit the data, including automatic clean up &#39;bots&#39;, algorithms, etc., etc.<br>

d) have all changes automatically emailed to a mailing list for community review, approval etc.<br><br>Now, once we get to step d, in the time since step a, the PDB data has been updated by the PDB. We now need to merge the updated PDB data with our independently modified data. (This is where we need to go beyond a simple revisioning system).<br>

<br>Merges should be automatic where possible (i.e. new entries give us no problem), modified entries that do not conflict can be merged. Conflicts, however, need to be flagged and resolved - the community needs to manually merge alternative &#39;fixes&#39; or updates. We need flexible rules for how to merge conflicts.<br>

<br>In this way each data entry will have a certain status (as well as a certain version); oringinal, comunity updated, conflicting, ... Its actually not that complex when it comes down to it - The PDB record has a release version, we have our &#39;community version&#39; and the text of the two may differ or not. The differences may come from several sources, and where possible they should be resolved.<br>

<br>

Here things start to resemble a bug-tracker more than a revisioning system...<br>

<br>Finally, all outstanding community &#39;fixes&#39; need to be regularly emailed to the PDB for review. <br><br>In summary, I think it is &#39;doable&#39;, but it isn&#39;t a case of simply mirroring into a revisioning system. We need to work on some protocols for synchronization.<br>

<br><br>&nbsp;</div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Mac<div><span class="e" id="q_118e1afaa4fb5f74_1"><br><br><div class="gmail_quote">

On Mon, Mar 24, 2008 at 11:07 AM, Tom Knight &lt;<a href="mailto:tk@csail.mit.edu" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">tk@csail.mit.edu</a>&gt; wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


I think we all share this frustration. &nbsp;An example of just how bad<br>

things are might be useful.<br>

<br>

About a year ago, I started working with yeast recombination, and<br>

wanted to use a pair of yeast artificial chromosome &nbsp;vectors that had<br>

been developed in the early &#39;90s. &nbsp;The developer was kind and helpful,<br>

and provided the vectors, but there was no sequence information<br>

available (not unusual for early vectors, where sequencing was<br>

difficult and expensive). &nbsp;One of the things I did was to fully<br>

sequence the vectors and to deposit the vector sequences into Genbank.<br>

<br>

I listed the original reference to the vectors as publication<br>

information. &nbsp;Apparently, this is not allowed. &nbsp;There was no way in<br>

which I could link the sequence I had just deposited with the source of<br>

the vector (or even to give credit for where it came from). &nbsp;The only<br>

link is the name of the plasmid, which I suppose is unique enough.<br>

<br>

But this madness has to stop.<br>

<div><div></div><div><br>

<br>

On Mar 24, 2008, at 10:26 AM, Dan Bolser wrote:<br>

<br>

&gt; &quot;That we would wholesale start changing people&#39;s records goes against<br>

&gt; our idea of an archive,&quot; says David Lipman, director of the National<br>

&gt; Center for Biotechnology Information (NCBI), GenBank&#39;s home in<br>

&gt; Bethesda, Maryland. &quot;It would be chaos.&quot;<br>

&gt;<br>

&gt; I think that quote highlights the problem (ignorance) that we have to<br>

&gt; overcome. People simply don&#39;t understand the nature of community<br>

&gt; projects. Just take the open source software movement for example;<br>

&gt; community + tools for basic collaboration = massively successful<br>

&gt; projects.<br>

&gt;<br>

&gt; How many databases in the molecular biology community include even the<br>

&gt; most basic of tools - a public bug tracker? If there is one out there,<br>

&gt; I don&#39;t know it. I find this fact simultaneously infuriating and<br>

&gt; dumbfounding, because it is simply unjustifiable. How about a public<br>

&gt; database project with a publically archived mailing lists? I had to<br>

&gt; start my own because the NCBI refused to do so;<br>

&gt;<br>

&gt; <a href="http://www.bioinformatics.org/mailman/listinfo/ssml-general" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.bioinformatics.org/mailman/listinfo/ssml-general</a><br>

&gt;<br>

&gt;<br>

&gt; I am having a similar running battle with the PDB, who staunchly<br>

&gt; refuse to alter (some of) their data, even though it contains clear<br>

&gt; errors. The recent remediation has been a huge improvement, but it<br>

&gt; doesn&#39;t go far enough.<br>

&gt;<br>

&gt; We simply need to build the kind of community annotation projects that<br>

&gt; will show the way for others. I have given up on the above kind of<br>

&gt; stupidity. There are only so many times that you can tell someone they<br>

&gt; need to install a public bug tracker before you get too tired to care<br>

&gt; that they won&#39;t install one any more.<br>

&gt;<br>

&gt;<br>

&gt; With hope for the future,<br>

&gt;<br>

&gt; Dan.<br>

&gt;<br>

&gt;<br>

&gt; ----<br>

&gt;<br>

&gt; <a href="http://BioDatabase.Org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://BioDatabase.Org</a><br>

&gt; <a href="http://PDBWiki.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://PDBWiki.org</a><br>

&gt; _______________________________________________<br>

&gt; OpenWetWare Discussion Mailing List<br>

&gt; <a href="mailto:discuss@openwetware.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">discuss@openwetware.org</a><br>

&gt; <a href="http://mailman.mit.edu/mailman/listinfo/oww-discuss" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://mailman.mit.edu/mailman/listinfo/oww-discuss</a><br>

<br>

_______________________________________________<br>

OpenWetWare Discussion Mailing List<br>

<a href="mailto:discuss@openwetware.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">discuss@openwetware.org</a><br>

<a href="http://mailman.mit.edu/mailman/listinfo/oww-discuss" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://mailman.mit.edu/mailman/listinfo/oww-discuss</a><br>

</div></div></blockquote></div><br><br clear="all"><br></span></div><span class="sg">-- <br>Mac Cowell<br>iGEM Coordinator<br><a href="http://igem.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">igem.org</a><br>

231.313.9062

</span></blockquote></div><br><br clear="all"><br>-- <br>hello