Quick! The Data Center Just Burned Down, What Do You Do?

You get the call at 2am. The data center is on fire, and while the server room itself was protected with your high-tech fire-fighting gear, the rest of the building billowed out smoke and noxious gasses that have contaminated your servers. Unless you have a sealed server room, this is a very real possibility. Another possibility is that the fire department had to spew a ton of liquid on your building to keep the fire from spreading. No sealed room means your servers might have taken a bath. And sealed rooms are a real rarity in datacenter design for a whole host of reasons starting with cost. So you turn to your DR plan, and step one is to make certain the load was shifted to an alternate location. That will buy you time to assess the damage. Little do you know that while a good start, that’s probably not enough of a plan to get you back to normal quickly.

It still makes me wonder when you talk to people about disaster recovery how different IT shops have different views of what’s necessary to recover from a disaster. The reason it makes me wonder is because few of them actually have a Disaster Recovery Plan. They have a “Pain Alleviation Plan”. This may be sufficient, depending upon the nature of your organization, but it may not be. You are going to need buildings, servers, infrastructure, and the knowledge to put everything back together – even that system that ran for ten years after the team that implemented it moved on to a new job. Because it wouldn’t still be running on Netware/Windows NT/OS2 if it wasn’t critical and expensive to replace. If you’re like most of us, you moved that system to a VM if at all possible years ago, but you’ll still have to get it plugged into a network it can work on, and your wires? They’re all suspect. The plan to restore your ADS can be painful in-and-of itself, let alone applying the different security settings to things like NAS and SAN devices, since they have different settings for different LUNS or even folders and files.

The massive amount of planning required to truly restore normal function of your systems is daunting to most organizations, and there are some question marks that just can’t be answered today for a disaster that might happen in a year or even ten – hopefully never, but we do disaster planning so that we’re prepared if it does, so never isn’t a good outlook while planning for the worst. While still at Network Computing, I looked at some great DR plans ranging from “send us VMs and we’ll ship you servers ready to rock the same day your disaster happens” to “We’ll drive a truck full of servers to your location and you can load them up with whatever you need and use our satellite connection to connect to the world”. Problem is that both of these require money from you every month while providing benefit only if you actually have a disaster. Insurance is a good thing, but increasing IT overhead is risky business. When budget time comes, the temptation to stop paying each month for something not immediately forwarding business needs is palpable.

And both of those solutions miss the ever-growing infrastructure part. Could you replace your BIG-IPs (or other ADC gear) tomorrow? You could get new ones from F5 pretty quickly, but do you have their configurations backed up so you can restore? How about the dozens of other network devices, NAS and SAN boxes, network architecture? Yeah, it’s going to be a lot of work. But it is manageable. There is going to be a huge time investment, but it’s disaster recovery, the time investment is in response to an emergency. Even so, adequate planning can cut down the time you have to invest to return to business-as-usual. Sometimes by huge amounts. Not having a plan is akin to setting the price for a product before you know what it costs to produce – you’ll regret it.

What do you need? Well if you’re lucky, you have more than one datacenter, and all you need to do is slightly oversize them to make sure you can pick up the slack if one goes down. If you’re not one of the lucky organizations, you’ll need a plan for getting a building with sufficient power, internet capability, and space, replace everything from power connections to racks to SAN and NAS boxes, restorable backups (seriously, test your backups or replication targets. There are horror stories…), and time for your staff to turn all of these raw elements into a functional datacenter. It’s a tall order, you need backups of the configs of all appliances and information from all of your vendors about replacement timelines. But should you ever need this plan, it is far better to have done some research than to wake up in the middle of the night and then, while you are down, spend time figuring it all out. The toughest bit is keeping it up to date, because a project to implement a DR plan is a discrete project, but updating costs for space and lists of vendors and gear on a regular basis is more drudgery and outside of project timelines. But it’s worth the effort as insurance.

And if your timeline is critical, look into one of those semi trailers – or the new thing (since 2005 or 2007 at least), containerized data centers - because when you need them, you need them. If you can’t afford to be down for more than a day or two, they’re a good stopgap while you rebuild.

SecurityProcedure.com has an aggregated list of free DR plans online. I’ve looked at a couple of the plans they list, they’re not horrible, but make certain you customize them to your organization’s needs. No generic plan is complete for your needs, so make certain you cover all of your bases if you use one of these. The key is to have a plan that dissects all the needs post-disaster. I’ve been through a disaster (The Great NWC Lab Flood), and there are always surprises, but having a plan to minimize them is a first step to maintaining your sanity and restoring your datacenter to full function.

In the future – the not-too-distant future – you will likely have the cloud as a backup, assuming that you have a product like our GTM to enable cloud-bursting, and that Global Load Balancer isn’t taken out by the fire. But even if it is, replacing one device to get your entire datacenter emulated in the cloud would not be anywhere near as painful as the rush to reassemble physical equipment.

Marketing Image of an IBM/APC Container

Lori and I? No, we have backups and insurance and that’s about it. But though our network is complex, we don’t have any businesses hosted on it, so this is perfectly acceptable for our needs. No containerized data centers for us. Let’s hope we, and you, never need any of this.

Published Jan 06, 2011

Version 1.0

global load balancing

gtm

hardware