I&#39;d be glad to discuss options tomorrow (Monday). <br><br>There&#39;s a split between the way a wiki can be exported and the way the database is managed. Since we currently use MySQL 4 with a single server, there are some limitations on what we can do, yet there are options re: replication we&#39;ve not explored.&nbsp; The&nbsp; MediaWiki&nbsp; server also affords more options we can look at as well. <br>

<br>It turned out that we did have a little bit of luck in that the backup provided by our hosting company, Rackspace, was only a binary file backup and not a mysql dump. The backup file images for the database were in good shape. This can be a problem: this binary backup can result in database corruption since the size of some of the files is so large that there are seconds or more of time between the start and end of the file copy within which transactions being written to the file can result in an inconsistent backup image. This is an optional service we didn&#39;t know Rackspace hadn&#39;t enabled&nbsp; when we moved to our current server. <br>

<br>I immediately deployed a script I&#39;ve used before, automysqlbackup.sh, as soon as the database was stable. This script is reliably creating static sql backups on a nightly basis. We&#39;re also enabling the binary log option to allow for intra-day incremental backup in addition to the nightly full database backup. This means that we&#39;ll be able to reload and be up and running in minutes the next time. I know this is little solace to people who lose information; we still want to maximize the amount of data that can be recovered, hopefully to the point of losing no data.<br>

<br>At the MediaWiki level, we externally back up all of the wiki pages in the database on a nightly basis. This is a disconnected backup from the SQL dump. This didn&#39;t help us since the backup takes place around midnight, before the Rackspace backup we reloaded from.<br>

<br>Using the Mediawiki API, we can do incremental exports of all new pages on a regular basis. This would allow us to recover the MediaWiki application level pages via a simple import via a special page. The nice thing about the API is that it can be run on a remote system, thus allowing us to provide for a constant auto-import of the pages into another server. It&#39;s not clear how we would manage the process of rolling the wiki onto a remote system for operational purposes. But we would never lose the contents of the database up until the time it went down. To implement this, we need to write a script that uses the mediawiki api. I&#39;ll be glad to provide details to anyone interested in working with me to do this. I can provide bindings in Python, Perl, C#, or PHP for this. <br>

<br>We currently have a full backup of all of the images and attachments uploaded. None were lost last week. But the database records pointing to them are only current as of around 4:00 AM on the 28th. Since these files are in a set of well-known directories, rsync can work the same way the page collection works. I&#39;ll have this set up today on an MIT system. We can sync on an hourly basis to keep the images up to date. <br>

<br>We have no problem with the performance of Rackspace in this situation. They responded to our request within 15 minutes. Since we had to reload all of the databases for the private wikis in addition to the main wiki (the MySQL INNODB tables use a single large binary file shared by all databases), I also needed to do full backups of all of the other databases, load in the 4:00 AM database image, then load in the backups to get them back to the 10:00 PM state they were in. We didn&#39;t lose data in any applications except for the main OpenWetWare wiki. <br>

<br>Because we use OpenID for access control on all of the private wiki&#39;s, they were inaccessible until the main wiki came back up. <br><br>The blog was completely disconnected from the OpenWetWare database. Because of this, there was no effect on it at all except for the need to swap in the new database image and then refresh it with the 10:00 PM info when it the database server was re-enabled. <br>

<br>Let&#39;s see what we can do to improve the system. As I mentioned, I&#39;m open to suggestions but we&#39;re already hard at work hardening the system. I&#39;m not saying this to appear disinterested in external assistance; we need to do what we know we can immediately. <br>

<br>Thanks.<br><br>Bill Flanagan<br><br><br><div class="gmail_quote">On Sun, Mar 2, 2008 at 3:20 PM, Alexander Wait Zaranek &lt;<a href="mailto:await@genetics.med.harvard.edu">await@genetics.med.harvard.edu</a>&gt; wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">On Tue, Feb 26, 2008 at 8:15 PM, Alexander Wait Zaranek<br>

<div class="Ih2E3d">&lt;<a href="mailto:await@genetics.med.harvard.edu">await@genetics.med.harvard.edu</a>&gt; wrote:<br>

</div><div class="Ih2E3d">&gt; On Tue, Feb 26, 2008 at 6:03 PM, Bryan Bishop &lt;<a href="mailto:kanzure@gmail.com">kanzure@gmail.com</a>&gt; wrote:<br>

&gt; &nbsp;&gt; &nbsp;Encourage people to set up servers and backups<br>

&gt; &nbsp;&gt; &nbsp;of the wiki all over the place, with central aggregation nodes to make<br>

&gt; &nbsp;&gt; &nbsp;sure all of the updates are propagated.<br>

&gt; &nbsp;&gt;<br>

&gt; &nbsp;actually, i wanted to offer to do this last steering committee<br>

&gt; &nbsp;meeting. &nbsp;Anyone else doing it already? &nbsp; We could also run a mysql<br>

&gt; &nbsp;slave so edits were up to the minute and not just a dump. &nbsp;Setup a<br>

&gt; &nbsp;dedicated virtual machine on one of our clusters? &nbsp; I&#39;d love to see it<br>

&gt; &nbsp;happen...<br>

&gt;<br>

</div>&quot;Feb. 28, 2008. OpenWetWare.org sustained a database failure on Feb.<br>

28, but is back online. We&#39;re deeply sorry for any inconvenience this<br>

may have caused. We&#39;ll update the community on what we&#39;ve done to<br>

recover and add more reliability to our procedures and infrastructure.<br>

Documents edited or created in <a href="http://www.OpenWetWare.org" target="_blank">www.OpenWetWare.org</a> on Feb 28 between<br>

4:00 AM EST to 7:00 PM EST will need to be updated or re-entered.&quot;<br>

<br>

So, it&#39;s never ideal to work on master-slave replication *after* a<br>

database failure but there&#39;s no time like the present. &nbsp; &nbsp;There&#39;s a<br>

bunch of talented freelance admins around the Church lab that could<br>

help with this. &nbsp;And we have the bandwidth/infrastructure too.<br>

<br>

How can we help?<br>

<div><div></div><div class="Wj3C7c">Sasha<br>

_______________________________________________<br>

OpenWetWare Discussion Mailing List<br>

<a href="mailto:discuss@openwetware.org">discuss@openwetware.org</a><br>

<a href="http://mailman.mit.edu/mailman/listinfo/oww-discuss" target="_blank">http://mailman.mit.edu/mailman/listinfo/oww-discuss</a><br>

</div></div></blockquote></div><br>