[PageOneX] [dev] Urgent: PageOneX is down

Rafael Porres Molina rporres at gmail.com
Thu Jan 23 14:37:54 EST 2014


Hi,

I don't know if I'm going to be able to make tonight.

Anyway this is what I saw talking with Pablo the other day.

- The instance where Rails and MySQL is running seems undersized
memory-wise.

numeroteca at li563-7:~$ head -1 /proc/meminfo
MemTotal:        1104816 kB

which is around 1 Gb of memory.

- Just before hitting the error in the video Pablo sent, you can see that
CPU I/O wait spiked, and the kernel swapping thread was most consuming
process that top was showing. That indicates that the system was trying to
page out memory pages to swap. As to why the browser returns a 500 error,
it could be that the ruby process dies or it hits a response timeout for
the Apache request (I see that Rails runs behind Apache, but I haven't had
the time to take a look to the configuration).

- As for the MySQL crashes, they also seem related to memory. Mysql error
logs show errors like the following:

140116  9:40:01 [Note] Plugin 'FEDERATED' is disabled.
140116  9:40:01 InnoDB: The InnoDB memory heap is disabled
140116  9:40:01 InnoDB: Mutexes and rw_locks use GCC atomic builtins
140116  9:40:01 InnoDB: Compressed tables use zlib 1.2.3.4
140116  9:40:01 InnoDB: Initializing buffer pool, size = 128.0M
InnoDB: mmap(137363456 bytes) failed; errno 12
140116  9:40:01 InnoDB: Completed initialization of buffer pool
140116  9:40:01 InnoDB: Fatal error: cannot allocate memory for the buffer
pool
140116  9:40:01 [ERROR] Plugin 'InnoDB' init function returned error.
140116  9:40:01 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE
failed.
140116  9:40:01 [ERROR] Unknown/unsupported storage engine: InnoDB
140116  9:40:01 [ERROR] Aborting

that indicates that it crashed while trying to allocate memory for the
innodb buffer pool. As it is not explicitly configured, in mysql 5.5 the
default is 128Mb, which is not much unless the instance where MySQL is
running is so short in memory.

http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_buffer_pool_size

The easiest thing to do is to try to get more memory for the instance where
pageonex is running. How much? Well, as much as you can afford :)

Regards,

Rafa



2014/1/23 pablo rey <pablo at basurama.org>

> Hi there,
> Reminder that we are meeting in 2 and a half hours (4.30pm EST) today!
> Send me your gmail account so that I can add you to the hangout.
>
> The idea is to brainstorm and understand why PageOneX is crashing when
> making long and short threads (500 internal server error) #179<https://github.com/numeroteca/pageonex/issues/179>+ why is the mySQL crashing as well.
>
> It would be great to understand how the server is built.
> After a short talk with Rafa Porres, we figured out that is probably
> related to a lack of memory (now we have 1Gb).
>
> Folks in Mexico made their own PageOnex deployment and research<http://estadodelperiodismo.articulo19.org/>and they also reported related issues, please, Alecs, post your problems in
> Github and share them through this list!
>
> See you in a while,
> pablo
>
>
> On Tue, Jan 21, 2014 at 6:27 PM, pablo rey <pablo at basurama.org> wrote:
>
>> Reminder for everyone interested: this Thursday Jan 23rd 4.30pm EST
>> (10.30pm CET) we have the session to try to solve different problems with
>> the server crashing at PageOneX.
>>
>> I made a video showing how it goes to "500 internal server error" when
>> editing a thread and adding one extra day, including logs and top:
>> http://www.youtube.com/watch?v=wuBMPu7Hty0
>>
>>
>> On Thu, Jan 9, 2014 at 12:08 AM, pablo rey <pablo at basurama.org> wrote:
>>
>>> Great Rafa, I'll write you off the list. If anyone is interested to join
>>> any of the meetings, shoot me an email!
>>>
>>>
>>> On Wed, Jan 8, 2014 at 11:55 PM, Rafael Porres Molina <rporres at gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> I'd be happy to join if I can be of any help.
>>>>
>>>> Pablo: We could arrange a previous hangout between you and me so that
>>>> you can give me some context to be more helpful.
>>>>
>>>> Regards,
>>>>
>>>> Rafa
>>>>
>>>>
>>>> 2014/1/8 Edward Platt <elplatt at media.mit.edu>
>>>>
>>>>> Works for me, talk then!
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:21 PM, pablo rey <pablo at basurama.org> wrote:
>>>>>
>>>>>> Given the 6 hours of difference Jan 23rd 4.30pm EST (10.30pm CET,
>>>>>> Spain) is good for me.
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 11:17 PM, Edward Platt <elplatt at media.mit.edu>wrote:
>>>>>>
>>>>>>> Woks for me, my day should be open so far.  What time's good for you?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:13 PM, pablo rey <pablo at basurama.org>wrote:
>>>>>>>
>>>>>>>> Let's set another date, then. Thursday Jan 23rd?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 8, 2014 at 9:05 PM, Edward Platt <elplatt at media.mit.edu
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Touching base sounds good.  After Jan 22 would be best.
>>>>>>>>>
>>>>>>>>> -Ed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 8, 2014 at 2:58 PM, pablo rey <pablo at basurama.org>wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Ed,
>>>>>>>>>> I had to restart mysql again minutes ago :-/
>>>>>>>>>> The problem with this issue and specially with #179<https://github.com/numeroteca/pageonex/issues/179>is that early adopters do not come back after a first failed attempt to
>>>>>>>>>> create a thread.
>>>>>>>>>>
>>>>>>>>>> I would suggest to set up a google hangout during IAP (=January
>>>>>>>>>> for non MIT folks!) to brainstorm together on this.
>>>>>>>>>> What do you think? anyone interested?
>>>>>>>>>> Which other sites are hosted in that server?
>>>>>>>>>>
>>>>>>>>>> How about next Monday Jan 13th at 4.30pm EST (10.30pm CET,
>>>>>>>>>> Spain)?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> p
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Dec 26, 2013 at 2:31 AM, Edward Platt <
>>>>>>>>>> elplatt at media.mit.edu> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the update Pablo.  I don't see anything telling in
>>>>>>>>>>> the logs you've sent.  The mysql problems may be unrelated to PageOneX as
>>>>>>>>>>> the server is shared.  It's also possible that the performance problems we
>>>>>>>>>>> were seeing before are somehow causing this.  I've been working out how to
>>>>>>>>>>> best fix those, but there doesn't seem to be a quick fix.  It's on our
>>>>>>>>>>> radar!
>>>>>>>>>>>
>>>>>>>>>>> -Ed
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Dec 25, 2013 at 4:06 PM, pablo rey <pablo at basurama.org>wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I had to restart the mysql again today!
>>>>>>>>>>>>
>>>>>>>>>>>> This time it crashed after:
>>>>>>>>>>>> Started GET "/about" for 91.206.178.40 at 2013-12-24 23:20:01
>>>>>>>>>>>> +0000
>>>>>>>>>>>> Processing by HomeController#about as HTML
>>>>>>>>>>>>   Rendered home/about.html.erb within layouts/application
>>>>>>>>>>>> (39.1ms)
>>>>>>>>>>>> Completed 200 OK in 340ms (Views: 268.5ms | ActiveRecord: 0.0ms)
>>>>>>>>>>>> Started GET "/numeroteca/corrupcion-espana-diciembre-2013/" for
>>>>>>>>>>>> 91.206.178.40 at 2013-12-24 23:20:02 +0000
>>>>>>>>>>>> Processing by CodingController#display as HTML
>>>>>>>>>>>>   Parameters: {"username"=>"numeroteca",
>>>>>>>>>>>> "thread_name"=>"corrupcion-espana-diciembre-2013"}
>>>>>>>>>>>>   Rendered coding/_coding_footer.html.erb (0.3ms)
>>>>>>>>>>>>   Rendered coding/display.html.erb within layouts/application
>>>>>>>>>>>> (5064.0ms)
>>>>>>>>>>>> Completed 200 OK in 5614ms (Views: 2025.2ms | ActiveRecord:
>>>>>>>>>>>> 3365.2ms)
>>>>>>>>>>>> Started GET "/users/sign_up" for 91.206.178.40 at 2013-12-24
>>>>>>>>>>>> 23:20:09 +0000
>>>>>>>>>>>> Processing by Devise::RegistrationsController#new as HTML
>>>>>>>>>>>>   Rendered devise/_links.erb (9758.9ms)
>>>>>>>>>>>>   Rendered devise/registrations/new.html.erb within
>>>>>>>>>>>> layouts/application (18132.6ms)
>>>>>>>>>>>> Completed 200 OK in 18877ms (Views: 18569.0ms | ActiveRecord:
>>>>>>>>>>>> 0.0ms)
>>>>>>>>>>>> Started GET "/users/sign_in" for 91.206.178.40 at 2013-12-24
>>>>>>>>>>>> 23:20:29 +0000
>>>>>>>>>>>> Processing by Devise::SessionsController#new as HTML
>>>>>>>>>>>>   Rendered devise/_links.erb (1.2ms)
>>>>>>>>>>>>   Rendered devise/sessions/new.html.erb within
>>>>>>>>>>>> layouts/application (102.2ms)
>>>>>>>>>>>> Completed 200 OK in 477ms (Views: 272.0ms | ActiveRecord: 0.0ms)
>>>>>>>>>>>>
>>>>>>>>>>>> Connecting to database specified by database.yml
>>>>>>>>>>>> Connecting to database specified by database.yml
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Dec 22, 2013 at 11:57 PM, pablo rey <pablo at basurama.org
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I had to restart mysql again minutes ago.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is the rails log before crashing:
>>>>>>>>>>>>> Started GET "/numeroteca/local-national-world-news/coding" for
>>>>>>>>>>>>> 5.10.83.101 at 2013-12-22 19:11:02 +0000
>>>>>>>>>>>>> Processing by CodingController#process_images as */*
>>>>>>>>>>>>>   Parameters: {"username"=>"numeroteca",
>>>>>>>>>>>>> "thread_name"=>"local-national-world-news"}
>>>>>>>>>>>>>   Rendered coding/_coding_footer.html.erb (0.1ms)
>>>>>>>>>>>>>   Rendered coding/process_images.html.erb within
>>>>>>>>>>>>> layouts/application (3323.2ms)
>>>>>>>>>>>>> Completed 200 OK in 3331ms (Views: 1658.8ms | ActiveRecord:
>>>>>>>>>>>>> 1668.8ms)
>>>>>>>>>>>>> Started GET "/" for 67.23.226.7 at 2013-12-22 19:11:10 +0000
>>>>>>>>>>>>> Processing by HomeController#index as HTML
>>>>>>>>>>>>>   Rendered home/index.html.erb within layouts/application
>>>>>>>>>>>>> (184.1ms)
>>>>>>>>>>>>> Completed 200 OK in 284ms (Views: 163.3ms | ActiveRecord:
>>>>>>>>>>>>> 120.0ms)
>>>>>>>>>>>>> Started GET "/assets/application.css" for 67.23.226.7 at
>>>>>>>>>>>>> 2013-12-22 19:11:16 +0000
>>>>>>>>>>>>> Served asset /application.css - 304 Not Modified (522ms)
>>>>>>>>>>>>> Started GET "/Donna/juicio-marita-veron-cronica/export.ods"
>>>>>>>>>>>>> for 5.10.83.105 at 2013-12-22 19:11:17 +0000
>>>>>>>>>>>>> Processing by ThreadsController#export as ODS
>>>>>>>>>>>>>   Parameters: {"username"=>"Donna",
>>>>>>>>>>>>> "thread_name"=>"juicio-marita-veron-cronica"}
>>>>>>>>>>>>> Sent file /tmp/export20131222-12134-1228f0a.ods (0.1ms)
>>>>>>>>>>>>> Completed 200 OK in 167ms (ActiveRecord: 95.3ms)
>>>>>>>>>>>>> Started GET "/davidmbusto/dragado-en-bahia-blanca/export.json"
>>>>>>>>>>>>> for 180.76.5.75 at 2013-12-22 19:11:17 +0000
>>>>>>>>>>>>> Processing by ThreadsController#export as JSON
>>>>>>>>>>>>>   Parameters: {"username"=>"davidmbusto",
>>>>>>>>>>>>> "thread_name"=>"davidmbusto"}
>>>>>>>>>>>>> Completed 200 OK in 93ms (Views: 0.2ms | ActiveRecord: 4.6ms)
>>>>>>>>>>>>> Started GET "/threads/" for 67.23.226.7 at 2013-12-22 19:11:19
>>>>>>>>>>>>> +0000
>>>>>>>>>>>>> Processing by ThreadsController#index as HTML
>>>>>>>>>>>>> Connecting to database specified by database.yml
>>>>>>>>>>>>> Connecting to database specified by database.yml
>>>>>>>>>>>>>
>>>>>>>>>>>>> After restarting pagination in the thread list  is not
>>>>>>>>>>>>> working again<https://github.com/numeroteca/pageonex/issues/182>.
>>>>>>>>>>>>> I mention it as pageonex crashed after a GET "/threads/" request. What
>>>>>>>>>>>>> could be the cause for making mysql die? Listing the threads doesn't seem
>>>>>>>>>>>>> very problematic.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Besides, I was trying to get a sense of the usage of PageOneX
>>>>>>>>>>>>> web site using the Piwik statistics. According to Piwik only 4 visits to
>>>>>>>>>>>>> the site today. According to the logs many many more. I assume many people
>>>>>>>>>>>>> have the stats blocked but the difference isway  too high... guesses?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Happy holidays!
>>>>>>>>>>>>> p
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Dec 15, 2013 at 1:53 AM, Edward Platt <
>>>>>>>>>>>>> elplatt at media.mit.edu> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should be fixed now.  Mysql wasn't running, not sure why and
>>>>>>>>>>>>>> there are no relevant log entries.  If you ever feel the need to restart
>>>>>>>>>>>>>> mysql, please go ahead.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Ed
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Dec 14, 2013 at 3:52 PM, pablo rey <
>>>>>>>>>>>>>> pablo at basurama.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Should I just restart the mysql server?
>>>>>>>>>>>>>>>   sudo service mysql restart
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Dec 14, 2013 at 9:09 PM, pablo rey <
>>>>>>>>>>>>>>> pablo at basurama.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Rahul and Ed!The home page returns a passenger message:
>>>>>>>>>>>>>>>> Ruby (Rack) application could not be started
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Error message: Can't connect to local MySQL server through
>>>>>>>>>>>>>>>> socket '/var/run/mysqld/mysqld.sock' (111) (Mysql2::Error)Exception
>>>>>>>>>>>>>>>> class: PhusionPassenger::UnknownError
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Which is the same message I was getting and reported at
>>>>>>>>>>>>>>>> https://github.com/numeroteca/pageonex/issues/179#issuecomment-30275654(but didn't make the app crashed).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It looks like it's been down since yesterday.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> p
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Pageonexdev mailing list
>>>>>>>>>>>>>>> Pageonexdev at mit.edu
>>>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/pageonexdev
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Edward L. Platt
>>>>>>>>>>>>>> Civic Technology Programmer
>>>>>>>>>>>>>> MIT Center for Civic Media
>>>>>>>>>>>>>> E15-348
>>>>>>>>>>>>>> http://civic.mit.edu
>>>>>>>>>>>>>> http://elplatt.com
>>>>>>>>>>>>>> @elplatt <http://twitter.com/elplatt>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Pageonexdev mailing list
>>>>>>>>>>>>>> Pageonexdev at mit.edu
>>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/pageonexdev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Edward L. Platt
>>>>>>>>>>> Civic Technology Programmer
>>>>>>>>>>> MIT Center for Civic Media
>>>>>>>>>>> E15-348
>>>>>>>>>>> http://civic.mit.edu
>>>>>>>>>>> http://elplatt.com
>>>>>>>>>>> @elplatt <http://twitter.com/elplatt>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Pageonexdev mailing list
>>>>>>>>>>> Pageonexdev at mit.edu
>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/pageonexdev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Edward L. Platt
>>>>>>>>> Civic Technology Programmer
>>>>>>>>> MIT Center for Civic Media
>>>>>>>>> E15-348
>>>>>>>>> http://civic.mit.edu
>>>>>>>>> http://elplatt.com
>>>>>>>>> @elplatt <http://twitter.com/elplatt>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Edward L. Platt
>>>>>>> Civic Technology Programmer
>>>>>>> MIT Center for Civic Media
>>>>>>> E15-348
>>>>>>> http://civic.mit.edu
>>>>>>> http://elplatt.com
>>>>>>> @elplatt <http://twitter.com/elplatt>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Edward L. Platt
>>>>> Civic Technology Programmer
>>>>> MIT Center for Civic Media
>>>>> E15-348
>>>>> http://civic.mit.edu
>>>>> http://elplatt.com
>>>>> @elplatt <http://twitter.com/elplatt>
>>>>>
>>>>> _______________________________________________
>>>>> Pageonexdev mailing list
>>>>> Pageonexdev at mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/pageonexdev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pageonexdev mailing list
>>>> Pageonexdev at mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/pageonexdev
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Pageonexdev mailing list
> Pageonexdev at mit.edu
> http://mailman.mit.edu/mailman/listinfo/pageonexdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/pageonexdev/attachments/20140123/32223d36/attachment-0001.htm


More information about the Pageonexdev mailing list