Anyone who's listened to Bob Rivers' Twisted The Twelve Pains of Christmas can probably relate to the angry husband screaming, "When one light goes out they all go out!" because, yeah, we've all been there.

Imagine now, if you will, a data center. A data center filled with servers humming along, each running three or four applications in virtual machines a la VMWare. Imagine now - it shouldn't be hard at all - that one of those servers suddenly just stops working. Let's say the drive crashes.

After the blue smoke dissipates and the screams of "When one component crashes they all crash" fade away, it's time to consider what just happened. (Yeah, the analogy here is a stretch, but try to go with it this morning, okay?)

Yes, OS virtualization has its benefits, there's no doubt about that. But like SOA and the challenges associated with composition-based application development, there are some interesting challenges associated with virtualizing the data center using OS virtualization. This challenge is particularly frustrating because aside from ensuring that the underlying hardware and OS upon which the virtualization solution is installed are stable and redundant, there isn't anything you can do to prevent this scenario from taking down not one but all the applications running within virtual containers on that server.

Even if it isn't a crash that causes the downtime, maintenance, upgrades, and patches will require the server be shutdown or rebooted, which means all the applications running within virtual containers on that server are going down.

You may recall that I walked through a BEA TCO analysis of virtualization and ended up with some controversial results. This subject is no different in that regard.

You'll notice that the blue in the stacked graph here represents "unplanned downtime". Now, I'm not sure what the definition of unplanned downtime is, but I assume that it's downtime that wasn't planned. :-) Seriously, I make the assumption that this category includes downtime resulting from things like hardware component failure, emergency patches, etc...

I find it interesting that the non-virtualized environment has more unplanned downtime than the virtualized OS. Presumably this is because there are fewer hardware servers and thus fewer things that can go wrong to cause the downtime in the first place.

What bothers me is that the cost of unplanned downtime must necessarily include things like lost revenue and productivity, and this is likely to be higher in a virtualized OS architecture because if one server crashes multiple applications are likely going to be affected. Imagine if it was SOA services and processes deployed across a virtualized OS architecture. Talk about a ripple effect.

Regardless of whether its SOA or traditional web-applications deployed on a non-virtualized or virtualized OS architecture, you need some kind of assurance of availability and resiliency. That kind of assurance comes from an application delivery network.

So if you're trying to mitigate the risk associated with downtime in either environment an application delivery network is going to be your best option. Even though you can bring new instances of your applications on-line in the event of a failure - extremely quickly and painlessly in the case of a virtualized OS architecture - you still have to make sure clients and customers can actually reach the new instance, and that means some sort of intelligent routing capability - the kind of intelligence you get with an application delivery controller as part of an application delivery network.

Imbibing: Coffee