[OWW-Discuss] Tapping into open source / open access and doing slightly more

Wed May 14 19:07:44 EDT 2008

Hey all,

I am not able to attend the week-day conference calls because of high 
school scheduling issues, but otherwise I've been meaning to suggest 
something to the group. I'll just send it here instead. :-)

I wrote an email about this in the context of space-based manufacturing:
http://heybryan.org/2008-05-09.html

But let me try to put things into context. I know that OWW has a good 
representation of programmers around these parts, so when I reference 
debian, I'm hoping it's not entirely lost. Take a look here:

http://debian.org/ and its Wikipedia article,
http://ubuntu.org/ And more concisely:
 "Debian is known for strict adherence to the Unix and free software 
philosophies. Debian is also known for its abundance of options — the 
current release includes over twenty-six thousand software packages for 
eleven computer architectures. These architectures range from the 
Intel/AMD 32-bit/64-bit architectures commonly found in personal 
computers to the ARM architecture commonly found in embedded systems 
and the IBM eServer zSeries mainframes. Throughout Debian's lifetime, 
other distributions have taken it as a basis to develop their own, 
including: Ubuntu, MEPIS, Dreamlinux, Damn Small Linux, Xandros, 
Knoppix, Linspire, sidux, Kanotix, and LinEx among others. A 
university's study concluded that Debian's 283 million source code 
lines would cost 10 billion USA Dollars to develop by proprietary 
means."

"Ubuntu's popularity has climbed steadily since its 2004 release. It has 
been the most viewed Linux distribution on Distrowatch.com in 2005,[4] 
2006,[5] In an August 2007 survey of 38,500 visitors on 
DesktopLinux.com, Ubuntu was the most popular distribution with 30.3 
percent of respondents using it.[7] Third party sites have arisen to 
provide Ubuntu packages outside of the Ubuntu organization. Ubuntu was 
awarded the Reader Award for best Linux distribution at the 2005 
LinuxWorld Conference and Expo in London.[107] It has been favorably 
reviewed in online and print publications.[108][109][110] Ubuntu won 
InfoWorld's 2007 Bossie Award for Best Open Source Client OS.[111] Mark 
Shuttleworth indicates that there were at least 8 million Ubuntu users 
at the end of 2006.[112] The large user-base has resulted in a large 
stable of non-Canonical websites. These include general help sites like 
Easy Ubuntu Linux,[113] dedicated weblogs (Ubuntu Gazette),[114] and 
niche sites within the Ubuntu Linux niche itself (Ubuntu Women).[115] 
The year 2007 saw the online publication of the first magazine 
dedicated to Ubuntu, Full Circle.[116]"

So, just what made these so successful? To the point where debian 
represents $10 billion USD of effort, all done by volunteer work? 
There's a bit more to mention:

http://advogato.org/article/972.html

"What are the issues? Why is it so important to go "distributed"? 

Debian is the largest independent of the longest-running of the Free 
Software Distributions in existence. There are over 1000 maintainers; 
nearly 20,000 packages. There are over 40 "Primary" Mirrors, and 
something like one hundred secondary mirrors (listed here - I'm stunned 
and shocked at the numbers!). 14 architectures are supported - 13 Linux 
ports and one GNU/Hurd port but only for i386 (aww bless iiit). A 
complete copy of the mirrors and their architectures, including source 
code, is over 160 gigabytes. 

At the last major upgrade of Debian/Stable, all the routers at the major 
International fibreoptic backbone sites across the world redlined for a 
week. 

To say that Debian is "big" is an understatement of the first order. 

Many mirror sites simply cannot cope with the requirements. Statistics 
on the Debian UK Mirror for July 2004 to June 2005 show 1.4 Terabytes 
of data served. As you can see from the list of mirror sites, many of 
the Secondary Mirrors and even a couple of the Primary ones have 
dropped certain architectures. 

security.debian.org - perhaps the most important of all the Debian 
sites - is definitely overloaded and undermirrored. 

This isn't all: there are mailing lists (the statistics show almost 
30,000 people on each of the announce and security lists, alone), and 
IRC channels - and both of those are over-spammed. The load on the 
mailing list server is so high that an idea (discussed informally at 
Debconf7 and outlined here later in this article, for completeness) to 
create an opt-in spam/voting system for people to "vet" postings and 
comments, was met with genuine concern and trepidation by the mailing 
list's maintainers. 

It's incredible that Debian Distribution and Development hasn't fallen 
into a big steaming heap of broken pieces, with administrators, users 
and ISPs all screaming at each other and wanting to scratch each 
others' eyes out on the mailing lists and IRC channels, only to find 
that those aren't there either. 

So it's basically coming through loud and clear: "server-based" 
infrastructure is simply not scalable, and the situation is only going 
to get worse as time progresses. That leaves "distributed 
architecture" - aka peer-to-peer architecture - as the viable 
alternative."

In other words, it's the social structure and community around debian, 
the 26,000 software packages, and that incredibly easy command where 
you can grab *any* software package and have it immediately installed. 
It's from a software repository. Kind of like biobricks, except 
functional. By that I don't mean biobricks is dysfunctional, but that 
biobricks is about data, debian's apt is about software and 
functionality. 

This is what one of my projects focuses on - that sort of easy gradient 
by which not only programs and software can be downloaded, but open 
access information, and open source projects of any sort, whether from 
the Maker Communities, the diybio groups, debian, gentoo, etc. 

For a dense explanation:
http://heybryan.org/exp.html

The 'architecture' is really ridiculously simple, it's just putting 
together some components that have been out on the web for a while. For 
example, all wikis have a revision control system, even the mediawiki 
installation for OWW. These revision systems, though, existed long 
before wikis popped up, I am particularly interested in 'git'. And for 
this reason I am also interested in ikiwiki, which can be made to look 
exactly like mediawiki, except with the important difference that it's 
based on 'git' for the revision control / history. This means that 
pages can be branched and so on, by anybody interested. 

It also means that you're not just providing open access data, but also 
the entire project [if the researcher is interested in going that far, 
of course]. All of the files - source code, CAD, diagrams via dia or 
graphviz, SVG, documentation, latex-source of the papers, notes, etc. 

It's really easy to implement. 

It's an extension of "open access" and "open source" in that it makes 
the whole "semantic web" thing really truly functional, making it 
actually *do* something.

And it's a useful way of doing research. What's the quote? The one from 
Gregory Wilson on bottlenecks in scientific computing? 
http://www.americanscientist.org/template/AssetDetail/assetid/48548
http://www.cs.toronto.edu/~gvwilson/
'figuring out how to make scientific programmers more productive'

"Those Who Will Not Learn From History..." 
Beautiful Code 
"Requirements in the Wild" 
"DrProject: A Software Project Management Portal to Meet Educational 
Needs" 
"Software Carpentry" 
Data Crunching 
"Learning By Doing: Introducing Version Control as a Way to Manage 
Student Assignments" 
"Where's the Real Bottleneck in Scientific Computing?" 
"Extensible Programming for the 21st Century" 
"Open Source, Cold Shoulder"

Anyway, the only thing left for implementation is changing up mediawiki 
a bit, writing some introductory tutorials [which I am doing anyway on 
another front], and then figuring out the file structure format (using 
YAML, so it's just writing classes in python), which frankly I think is 
something that individual researchers would be more suited to doing. 
For example, that's why we have the excellent Systems Biology Markup 
Language (sbml.org), and I don't exactly have a broad enough overview 
of the field to make it happen.

You get all of the benefits of software reuse, but with project reuse, 
with all of the sharing and acceleration of progress that the internet 
can allow for. So what are the general thoughts on this?

- Bryan
________________________________________
http://heybryan.org/