[OWW-Discuss] Proposal to 'Wikify' GenBank Meets Stiff Resistance

Tue Mar 25 06:54:29 EDT 2008

On 24/03/2008, Mackenzie Cowell <macowell at gmail.com> wrote:
>
> Why doesn't someone start mirroring ncbi and layering a lightweight
> revisioning layer on top of the content?  It could live at ncbi2.org.  Or
> it could be mirrored directly into freebase.com, which already provides
> those tools.

The difficulty comes in 'merging' back with the original data. This requires
some thinking about, but is generally do-able (I believe).

The conceptually cleanest way to think about the problem is in terms of a
versioning (revisioning) system such as SVN. (Sorry, that is what you just
said - I should read more carefully!)

Lets say we do the following,

a) grab all the data from the PDB (for example)
b) stick all the data into a revisioning system
c) allow users to freely edit the data, including automatic clean up 'bots',
algorithms, etc., etc.
d) have all changes automatically emailed to a mailing list for community
review, approval etc.

Now, once we get to step d, in the time since step a, the PDB data has been
updated by the PDB. We now need to merge the updated PDB data with our
independently modified data. (This is where we need to go beyond a simple
revisioning system).

Merges should be automatic where possible (i.e. new entries give us no
problem), modified entries that do not conflict can be merged. Conflicts,
however, need to be flagged and resolved - the community needs to manually
merge alternative 'fixes' or updates. We need flexible rules for how to
merge conflicts.

In this way each data entry will have a certain status (as well as a certain
version); oringinal, comunity updated, conflicting, ... Its actually not
that complex when it comes down to it - The PDB record has a release
version, we have our 'community version' and the text of the two may differ
or not. The differences may come from several sources, and where possible
they should be resolved.

Here things start to resemble a bug-tracker more than a revisioning
system...

Finally, all outstanding community 'fixes' need to be regularly emailed to
the PDB for review.

In summary, I think it is 'doable', but it isn't a case of simply mirroring
into a revisioning system. We need to work on some protocols for
synchronization.

Mac
>
> On Mon, Mar 24, 2008 at 11:07 AM, Tom Knight <tk at csail.mit.edu> wrote:
>
> > I think we all share this frustration.  An example of just how bad
> > things are might be useful.
> >
> > About a year ago, I started working with yeast recombination, and
> > wanted to use a pair of yeast artificial chromosome  vectors that had
> > been developed in the early '90s.  The developer was kind and helpful,
> > and provided the vectors, but there was no sequence information
> > available (not unusual for early vectors, where sequencing was
> > difficult and expensive).  One of the things I did was to fully
> > sequence the vectors and to deposit the vector sequences into Genbank.
> >
> > I listed the original reference to the vectors as publication
> > information.  Apparently, this is not allowed.  There was no way in
> > which I could link the sequence I had just deposited with the source of
> > the vector (or even to give credit for where it came from).  The only
> > link is the name of the plasmid, which I suppose is unique enough.
> >
> > But this madness has to stop.
> >
> >
> > On Mar 24, 2008, at 10:26 AM, Dan Bolser wrote:
> >
> > > "That we would wholesale start changing people's records goes against
> > > our idea of an archive," says David Lipman, director of the National
> > > Center for Biotechnology Information (NCBI), GenBank's home in
> > > Bethesda, Maryland. "It would be chaos."
> > >
> > > I think that quote highlights the problem (ignorance) that we have to
> > > overcome. People simply don't understand the nature of community
> > > projects. Just take the open source software movement for example;
> > > community + tools for basic collaboration = massively successful
> > > projects.
> > >
> > > How many databases in the molecular biology community include even the
> > > most basic of tools - a public bug tracker? If there is one out there,
> > > I don't know it. I find this fact simultaneously infuriating and
> > > dumbfounding, because it is simply unjustifiable. How about a public
> > > database project with a publically archived mailing lists? I had to
> > > start my own because the NCBI refused to do so;
> > >
> > > http://www.bioinformatics.org/mailman/listinfo/ssml-general
> > >
> > >
> > > I am having a similar running battle with the PDB, who staunchly
> > > refuse to alter (some of) their data, even though it contains clear
> > > errors. The recent remediation has been a huge improvement, but it
> > > doesn't go far enough.
> > >
> > > We simply need to build the kind of community annotation projects that
> > > will show the way for others. I have given up on the above kind of
> > > stupidity. There are only so many times that you can tell someone they
> > > need to install a public bug tracker before you get too tired to care
> > > that they won't install one any more.
> > >
> > >
> > > With hope for the future,
> > >
> > > Dan.
> > >
> > >
> > > ----
> > >
> > > http://BioDatabase.Org
> > > http://PDBWiki.org
> > > _______________________________________________
> > > OpenWetWare Discussion Mailing List
> > > discuss at openwetware.org
> > > http://mailman.mit.edu/mailman/listinfo/oww-discuss
> >
> > _______________________________________________
> > OpenWetWare Discussion Mailing List
> > discuss at openwetware.org
> > http://mailman.mit.edu/mailman/listinfo/oww-discuss
> >
>
>
>
> --
> Mac Cowell
> iGEM Coordinator
> igem.org
> 231.313.9062

-- 
hello
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/oww-discuss/attachments/20080325/a0b4d4b0/attachment.htm