[OWW-Discuss] Tapping into open source / open access and doing slightly more
Bryan Bishop
kanzure at gmail.com
Wed May 14 19:07:44 EDT 2008
Hey all,
I am not able to attend the week-day conference calls because of high
school scheduling issues, but otherwise I've been meaning to suggest
something to the group. I'll just send it here instead. :-)
I wrote an email about this in the context of space-based manufacturing:
http://heybryan.org/2008-05-09.html
But let me try to put things into context. I know that OWW has a good
representation of programmers around these parts, so when I reference
debian, I'm hoping it's not entirely lost. Take a look here:
http://debian.org/ and its Wikipedia article,
http://ubuntu.org/ And more concisely:
"Debian is known for strict adherence to the Unix and free software
philosophies. Debian is also known for its abundance of options — the
current release includes over twenty-six thousand software packages for
eleven computer architectures. These architectures range from the
Intel/AMD 32-bit/64-bit architectures commonly found in personal
computers to the ARM architecture commonly found in embedded systems
and the IBM eServer zSeries mainframes. Throughout Debian's lifetime,
other distributions have taken it as a basis to develop their own,
including: Ubuntu, MEPIS, Dreamlinux, Damn Small Linux, Xandros,
Knoppix, Linspire, sidux, Kanotix, and LinEx among others. A
university's study concluded that Debian's 283 million source code
lines would cost 10 billion USA Dollars to develop by proprietary
means."
"Ubuntu's popularity has climbed steadily since its 2004 release. It has
been the most viewed Linux distribution on Distrowatch.com in 2005,[4]
2006,[5] In an August 2007 survey of 38,500 visitors on
DesktopLinux.com, Ubuntu was the most popular distribution with 30.3
percent of respondents using it.[7] Third party sites have arisen to
provide Ubuntu packages outside of the Ubuntu organization. Ubuntu was
awarded the Reader Award for best Linux distribution at the 2005
LinuxWorld Conference and Expo in London.[107] It has been favorably
reviewed in online and print publications.[108][109][110] Ubuntu won
InfoWorld's 2007 Bossie Award for Best Open Source Client OS.[111] Mark
Shuttleworth indicates that there were at least 8 million Ubuntu users
at the end of 2006.[112] The large user-base has resulted in a large
stable of non-Canonical websites. These include general help sites like
Easy Ubuntu Linux,[113] dedicated weblogs (Ubuntu Gazette),[114] and
niche sites within the Ubuntu Linux niche itself (Ubuntu Women).[115]
The year 2007 saw the online publication of the first magazine
dedicated to Ubuntu, Full Circle.[116]"
So, just what made these so successful? To the point where debian
represents $10 billion USD of effort, all done by volunteer work?
There's a bit more to mention:
http://advogato.org/article/972.html
"What are the issues? Why is it so important to go "distributed"?
Debian is the largest independent of the longest-running of the Free
Software Distributions in existence. There are over 1000 maintainers;
nearly 20,000 packages. There are over 40 "Primary" Mirrors, and
something like one hundred secondary mirrors (listed here - I'm stunned
and shocked at the numbers!). 14 architectures are supported - 13 Linux
ports and one GNU/Hurd port but only for i386 (aww bless iiit). A
complete copy of the mirrors and their architectures, including source
code, is over 160 gigabytes.
At the last major upgrade of Debian/Stable, all the routers at the major
International fibreoptic backbone sites across the world redlined for a
week.
To say that Debian is "big" is an understatement of the first order.
Many mirror sites simply cannot cope with the requirements. Statistics
on the Debian UK Mirror for July 2004 to June 2005 show 1.4 Terabytes
of data served. As you can see from the list of mirror sites, many of
the Secondary Mirrors and even a couple of the Primary ones have
dropped certain architectures.
security.debian.org - perhaps the most important of all the Debian
sites - is definitely overloaded and undermirrored.
This isn't all: there are mailing lists (the statistics show almost
30,000 people on each of the announce and security lists, alone), and
IRC channels - and both of those are over-spammed. The load on the
mailing list server is so high that an idea (discussed informally at
Debconf7 and outlined here later in this article, for completeness) to
create an opt-in spam/voting system for people to "vet" postings and
comments, was met with genuine concern and trepidation by the mailing
list's maintainers.
It's incredible that Debian Distribution and Development hasn't fallen
into a big steaming heap of broken pieces, with administrators, users
and ISPs all screaming at each other and wanting to scratch each
others' eyes out on the mailing lists and IRC channels, only to find
that those aren't there either.
So it's basically coming through loud and clear: "server-based"
infrastructure is simply not scalable, and the situation is only going
to get worse as time progresses. That leaves "distributed
architecture" - aka peer-to-peer architecture - as the viable
alternative."
In other words, it's the social structure and community around debian,
the 26,000 software packages, and that incredibly easy command where
you can grab *any* software package and have it immediately installed.
It's from a software repository. Kind of like biobricks, except
functional. By that I don't mean biobricks is dysfunctional, but that
biobricks is about data, debian's apt is about software and
functionality.
This is what one of my projects focuses on - that sort of easy gradient
by which not only programs and software can be downloaded, but open
access information, and open source projects of any sort, whether from
the Maker Communities, the diybio groups, debian, gentoo, etc.
For a dense explanation:
http://heybryan.org/exp.html
The 'architecture' is really ridiculously simple, it's just putting
together some components that have been out on the web for a while. For
example, all wikis have a revision control system, even the mediawiki
installation for OWW. These revision systems, though, existed long
before wikis popped up, I am particularly interested in 'git'. And for
this reason I am also interested in ikiwiki, which can be made to look
exactly like mediawiki, except with the important difference that it's
based on 'git' for the revision control / history. This means that
pages can be branched and so on, by anybody interested.
It also means that you're not just providing open access data, but also
the entire project [if the researcher is interested in going that far,
of course]. All of the files - source code, CAD, diagrams via dia or
graphviz, SVG, documentation, latex-source of the papers, notes, etc.
It's really easy to implement.
It's an extension of "open access" and "open source" in that it makes
the whole "semantic web" thing really truly functional, making it
actually *do* something.
And it's a useful way of doing research. What's the quote? The one from
Gregory Wilson on bottlenecks in scientific computing?
http://www.americanscientist.org/template/AssetDetail/assetid/48548
http://www.cs.toronto.edu/~gvwilson/
'figuring out how to make scientific programmers more productive'
"Those Who Will Not Learn From History..."
Beautiful Code
"Requirements in the Wild"
"DrProject: A Software Project Management Portal to Meet Educational
Needs"
"Software Carpentry"
Data Crunching
"Learning By Doing: Introducing Version Control as a Way to Manage
Student Assignments"
"Where's the Real Bottleneck in Scientific Computing?"
"Extensible Programming for the 21st Century"
"Open Source, Cold Shoulder"
Anyway, the only thing left for implementation is changing up mediawiki
a bit, writing some introductory tutorials [which I am doing anyway on
another front], and then figuring out the file structure format (using
YAML, so it's just writing classes in python), which frankly I think is
something that individual researchers would be more suited to doing.
For example, that's why we have the excellent Systems Biology Markup
Language (sbml.org), and I don't exactly have a broad enough overview
of the field to make it happen.
You get all of the benefits of software reuse, but with project reuse,
with all of the sharing and acceleration of progress that the internet
can allow for. So what are the general thoughts on this?
- Bryan
________________________________________
http://heybryan.org/
More information about the Oww-discuss
mailing list