Today’s Service Interruption

Today we had an outage that affected all buyhttp.com sites including our main site, account manager, and support system. They were down for approximately two hours.

We use a different datacenter to host our sites than we use for customer sites, and it is for problems just like this. When the server went down, we started work on bringing up our main site on a server at our customer datacenter so we could display a message and let everyone coming to the site know what was happening. The datacenter that was having problems communicated quickly and let us know that the problem should be resolved within 2 hours, so we made the decision not to transfer the account manager and support systems and instead use our gmail account for communication during the outage.

At the datacenter, their fire suppression system detected smoke and the fire alarms went off. The fire department was automatically called to the building and upon arrival made the datacenter to turn off all power for the safety of the fire fighters. It turned out there was no fire and the fire extinguishers didn’t go off.

During this outage there were no issues with customer sites.

UPDATE:

This is the official explanation of events from the datacenter:

*****
At 3:24PM Central Standard Time (4:24PM ET), an incident occurred in the vicinity of our Arlington Heights data center. A power line went down as a result of an accident. This created massive power surge that was greatly mitigated by our breakers and surge protectors but resulted in one short, inciting some smoke. This was detected by our VESDA double pre-action smoke detection system and the fire department quickly arrived on site. This initial incident did not affect our service provision. However, the fire department instructed us to begin an EPO (emergency power off) procedure on our battery backup units as a safety precaution in accordance with the fire department review. The safety inspector and our electricians were required to be on site and ensure a safe power up to safeguard against equipment damage before we were able to move forward. The moment that we were cleared for a power up, we began turning customers back on in phases (as per recovery procedure, so as to not overload PDUs) and with haste.
*****

Leave a Reply

Your email address will not be published.

Top