From aldavis at wi.mit.edu Fri Sep 15 11:12:29 2006 From: aldavis at wi.mit.edu (Al Davis) Date: Fri, 15 Sep 2006 11:12:29 -0400 Subject: [CSBi-HPC] NE47 power outage - Sat Sept 23rd Message-ID: <450AC2DD.7010300@wi.mit.edu> As part of the ongoing NE47 building renovation for Novartis, there will be a building wide power outage on Saturday Sept 23rd. The power will be out the whole day which means there won't be any power in the computer server room. I'll be turning off the CSBi IBM cluster here in NE47 Friday night and turn it back on Sunday - plan accordingly. MIT has agreed to pay for the installation of an electrical circuit to connect the computer room UPS to the building emergency generator, which means we'll be able to keep the lights on (and the computers) even when the building power goes out. This is a fairly large project, so the ETA for this is still several months away and thus it won't be installed in time for this power outage. Respond back if you have questions/concerns. thanks, al -- Al Davis aldavis at wi.mit.edu | aldavis at mit.edu Systems Manager 617.324.0519 CSBi & WI/MIT BioImaging Center NE47 Rm 311 (500 Technology Sq) From aldavis at wi.mit.edu Sun Sep 24 18:29:03 2006 From: aldavis at wi.mit.edu (Al Davis) Date: Sun, 24 Sep 2006 18:29:03 -0400 Subject: [CSBi-HPC] BIM cluster is not working Message-ID: <451706AF.8090408@wi.mit.edu> Having a problem getting the IBM cluster back up after the power outage. There is a hardware problem with the system which needs to get fixed. Working on several solutions, will keep you up to date. al -- Al Davis aldavis at wi.mit.edu | aldavis at mit.edu Systems Manager 617.324.0519 CSBi & WI/MIT BioImaging Center NE47 Rm 311 (500 Technology Sq) From aldavis at wi.mit.edu Wed Sep 27 12:44:42 2006 From: aldavis at wi.mit.edu (Al Davis) Date: Wed, 27 Sep 2006 12:44:42 -0400 Subject: [CSBi-HPC] IBM cluster is back up Message-ID: <451AAA7A.9020905@wi.mit.edu> The IBM cluster is back up after repairing the cluster management console system (replaced a dead system drive), reloading and then reconfiguring all the software until it could talk to the cluster nodes and tell them to turn on. BIG thanks to IBMs Ed Geraghty, who helped throughout the whole process and knew the last bit of magic that enabled the management console to access the cluster. Let me know if you find something that doesn't work as I haven't tested everything yet. al -- Al Davis aldavis at wi.mit.edu | aldavis at mit.edu Systems Manager 617.324.0519 CSBi & WI/MIT BioImaging Center NE47 Rm 311 (500 Technology Sq) From aldavis at wi.mit.edu Fri Sep 29 08:04:42 2006 From: aldavis at wi.mit.edu (Al Davis) Date: Fri, 29 Sep 2006 08:04:42 -0400 Subject: [CSBi-HPC] IBM cluster home dir is full Message-ID: <451D0BDA.10209@wi.mit.edu> The filesystem used for home directories on the IBM cluster mitwilogin is full (~650GBs). This is preventing any user jobs from being processed. Please remove any unneeded files in your home directory, so applications can run again. thanks, al -- Al Davis aldavis at wi.mit.edu | aldavis at mit.edu Systems Manager 617.324.0519 CSBi & WI/MIT BioImaging Center NE47 Rm 311 (500 Technology Sq)