|
Thread Rating: 1 votes, 5.00 average. Rate: - Thread Rating: 1 votes, 5.00 average. Rate:
- 1
- 2
- 3
- 4
- 5
Explanation on Long Downtime (2012-1-29), Please read if you are interested
E86 |
|
Lead Administrator
Group: ADMINISTRATOR
Posts: 587
Member No.: 2
Joined: Dec 22nd 2006
Location: San Leandro, CA
|
As many of you have noticed, the forum was down for almost the entire Sunday. It first went down on 3:30am Pacific Time January 29th, 2012. I did not know until I wake up by 8:00am. I quickly sent several notices to the facebook group to spread awareness on the problem. I then submitted a ticket to the our host's tech support. They responded with the following within 15 minutes: QUOTE (Reply from DreamHost (Jan 29 @ 2012 - 09:13:20 / #5393xxxx)) | Subject: Re: Site Down First, I'd like to apologize for the use of a canned response to this support issue. We have identified and are actively working on a fix for the problems you are experiencing with FTP and Web services. This issue is affecting a large subset of our customers and as such our system administration, data center operations, and development teams are all working on resolving these issues as quickly as possible.
We are making this information available, along with updates, on the http://www.dreamhoststatus.com/ website. Please follow this url to keep up to date with any future updates.
http://www.dreamhoststatus.com/2012/01/29/...s-for-a-subset- of-vps-shared-and-dedicated-machines-down/
Again, we do apologize for the lack of a personal response to this support request and if you have any further questions please let us know.
Thank you for your understanding in this matter, JJ Galvez DreamHost Technical Support |
Basically, they were aware of the issue. As time progress through the day, many customers like me were getting angry at this long downtime. Finally by 11:00pm Pacific Time, the site is up. The entire downtime lasted 19.5 hours. Here is a message from the CEO of the host;
QUOTE | Update Jan 29th, 9:40pm PST:
From Simon Anderson, CEO, DreamHost: My sincere apologies for the downtime experienced today by many of our dedicated and VPS customers, plus some shared customers. I know that this has been a poor customer experience for you. Almost all services are back up after an intense effort from the DreamHost dev, admin, data center and support teams. I was involved in the coordination of our efforts today and now am able to share what happened, and what we’re going to do to reduce the risk that it happens again.
We run Debian OS and have used autoupdates to ensure security packages are installed as soon as they are available. We’ve had some breakage in the past from this approach, but nothing major. However last night’s autoupdate went badly wrong, removing essential packages from dedicated, VPS and some shared servers. Our monitoring and support team flagged the issue fast, and we scrambled our admin, dev and NOC teams to reinstall the packages that had been removed by autoupdate, reboot servers, fix package dependencies, and test that individual services were live. Given the number of services affected, this took a long time to complete. Rest assured we had all hands working on the issue, but I know it was still a frustrating experience for customers.
To mitigate the risk of anything like this happening again, we’re immediately switching off autoupdates, and moving to a manual process where we’ll only push out Debian updates after significant testing. There’s always a balance to be struck between speed, efficiency, security and issue prevention, but this event has shown us that we need to take a different approach. Again, my apologies for the downtime experienced today. We’re acutely focused on adjusting our processes and systems to ensure we do a better job going forward. – Simon |
It was caused by a botched automatic update that somehow caused automatic deletion of some important modules on the server, which caused the service to go down shortly after 3:30am. I am currently in the process of getting some sort of compensation for such long downtime. If they refuse, I might be forced to switch to another host. This sort of downtime is simply unacceptable.
|
|
|
Geo |
|
all bow before me
Group: FORUM MODERATOR
Posts: 33,043
Member No.: 1,760
Joined: Dec 25th 2007
Location: Sydney, Australia
|
You guys are sort of misunderstanding the problem at hand. The reason we moved to the current server we're now was because the previous one was fairly unreliable and the amount of quality the current server was providing us was good until the recent downtime.
At the end of the day, it's up to E86 to choose what to do with the forum and where to take it.
|
|
|
Track this topic
Receive email notification when a reply has been made to this topic and you are not active on the board.
Subscribe to this forum
Receive email notification when a new topic is posted in this forum and you are not active on the board.
Download / Print this Topic
Download this topic in different formats or view a printer friendly version.
|
|