Sun Blogs: From the Editors
(Un)Scheduled Maintenance: Is This Thing On?
September 18, 2009 - 12:00amWelcome back. If you're a regular reader of CornellSun.com, you may have noticed that earlier in the week, our Web site was unavailable for an embarrassingly long amount of time. For that, I am sincerely sorry, and you deserve an explanation.
Since 2006, when The Sun took its web operations in-house, we have relied on professionals to maintain the server equipment that stores and serves our online edition. This means that we're entirely responsible for what you see, but the computers running behind the scenes are housed in a secure datacenter that is monitored 24/7 by a professional staff of server technicians.
This past Sunday morning, we first discovered that our server was not responding. We immediately contacted the support staff and they began diagnosing the problem. After a few hours they felt they had found the problem. They told us that the RAID controller in our server had failed, and because of this, our filesystem, including all stored content, had been lost. Ironically, RAID stands for Redundant Array of Inexpensive Disks, and its purpose is to guard against hard disk failure. Unfortunately, there is no safeguard against a failure in the RAID controller itself.
Within a few hours, the technicians had replaced the bad hardware and we were left with a fresh operating system on our server – complete with that new server smell. What followed was an all-night session to recreate our Web site from backups that we take on a weekly basis. By no small amount of luck, one of those backups had been completed Sunday morning, meaning data loss was very minimal.
By Monday morning we were ready to go back online. Unfortunately, as soon as we did, the server went down again, and every attempt to troubleshoot the situation on our end failed. Once again we contacted the server support staff. It took the better part of Monday to determine the problem, and ultimately it was discovered that other hardware components were malfunctioning, and the solution would be to completely rebuild the server.
By this time, we felt that we could no longer wait for our primary host to fix the situation, and we made the decision to begin hosting our Web site from a secondary server, maintained by a different company, which we ordinarily use for other purposes. We once again set out to rebuild the site from backups, and we also began to configure the server to begin hosting cornellsun.com. We worked through Tuesday to do this, and on Wednesday we began publishing the content that had been published in print on Monday, Tuesday, and Wednesday. Finally, on Wednesday night, we flipped the switch and went back online.
Despite some circumstances that were out of our control, certain steps could have been taken earlier on our part to prepare us for such emergencies. We're working to establish a better backup system for our Web site so that our response time and downtime can be greatly reduced. We're also now working with a hosting provider that uses much newer, more reliable server technologies.
Everything should be back to normal; however, it is a big site. If you notice anything that's out of place, missing, or just different than you remember, please report it to us at web@cornellsun.com.
Thank you for your patience – and now, back to your regularly scheduled life.
