Outage

Latest news from RouterTech
Locked
User avatar
Kieran
RouterTech Team
RouterTech Team
Posts: 2675
Joined: Fri Jan 20, 2006 11:30 am
Location: London
Contact:

Outage

Post by Kieran » Wed Jan 16, 2008 1:19 pm

I would like to apologise on behalf of the RouterTech team for the server outage that occurred this morning at around 3am and resulted in the server being down for a number of hours and the RouterTech site not returning until early this afternoon.

The outage was caused by a serious power failure in the datacentre where our server resides. Failed air conditioining units in another area of the datacentre caused some servers in the vicinity to overheat and this caused circuit breakers to trip. This caused a knock on effect and blew a whole load of fuses and power supply units that supply our rack. Engineers were dispatched to the location to resolve the issue, but due to the extent of the power issues and the requirement to run a disk check after the power was pulled so suddenly, things took a while to restore.

Apologies for any inconvenience caused.
Last edited by Kieran on Sun Nov 23, 2008 3:30 pm, edited 1 time in total.
Kieran
"Indeed!"
Invaluable links: Forum Rules | Networking Guides | FAQ | Site Search | Forum Search <-- Use it or feel my wrath!
No support via PM, please ask your questions in the forum!
User avatar
Kieran
RouterTech Team
RouterTech Team
Posts: 2675
Joined: Fri Jan 20, 2006 11:30 am
Location: London
Contact:

Post by Kieran » Wed Jan 16, 2008 8:41 pm

There was another outage this evening at about 17:30 due to another blown fuse. It appears that the power outage has caused some damage in some of our power strips which have caused our power supply to become unstable. The fuse has now been replaced and the server brought up again, but we can't be sure if it will stay like that - we hope so though.

In order to combat this issue and stop a recurrence the server will be going down tonight at around 1am to facilitate the replacing of these power strips and the re-routing of some power cables to try and prevent a recurrence of the problem. This should hopefully see the end of all power issues for our server.

When it rains, it pours.
Kieran
"Indeed!"
Invaluable links: Forum Rules | Networking Guides | FAQ | Site Search | Forum Search <-- Use it or feel my wrath!
No support via PM, please ask your questions in the forum!
User avatar
Kieran
RouterTech Team
RouterTech Team
Posts: 2675
Joined: Fri Jan 20, 2006 11:30 am
Location: London
Contact:

Post by Kieran » Fri Jan 18, 2008 1:02 pm

Just to update this thread and complete the downtime story as it were.

Two nights ago the replacing of power strips and cables went to plan and the rack in which our server resides was brought back online. Unfortunately, the repeated outages had taken its toll on the RAID array and it was sufficiently corrupted such that it had to be copied out to another set of disks and the recovery experts called in to try and reconstruct it all. A usual rebuild was not sufficient and would not complete.

By about 3pm yesterday the recovery experts succeeded in their remit and restored the data that had been copied from the array. We then set about copying the data back to the array which took a fair bit of time and then finally the array had to be brought back online and normal operation verified.

The server was then brought back up and seemingly most things are running fine. There are a few things that need to be sorted out but these are all minor teething troubles that will not affect the quality of service.

Profuse apologies for the amount of downtime that RouterTech has been subject to. We very much hope such an incident will never occur again.
Kieran
"Indeed!"
Invaluable links: Forum Rules | Networking Guides | FAQ | Site Search | Forum Search <-- Use it or feel my wrath!
No support via PM, please ask your questions in the forum!
Locked