Just because you can, doesn't mean you should.

I'm going to start this one by quoting Hoff who was quoting Andreas Antonopoulos of Nemertes Research Group who was paraphrasing a concept put forth by Doug Gourlay.

From Rational Survivability

"How about using netflow information to re-balance servers in a data center"

Routing: Controlling the flow of network traffic to an optimal path between two nodes

Virtual-Routing or Anti-Routing: VMotioning nodes (servers) to optimize the flow of traffic on the network.

Using netflow information, identify those nodes (virtual servers) that have the highest traffic "affinity" from a volume perspective (or some other desired metric, like desired latency etc) and move (VMotion, XenMotion) the nodes around to re-balance the network. For example, bring the virtual servers exchanging the most traffic to hosts on the same switch or even to the same host to minimize traffic crossing multiple switches. Create a whole-data-center mapping of traffic flows, solve for least switch hops per flow and re-map all the servers in the data center to optimize network traffic.

My first reaction was, yup, that makes a lot of sense from a network point of view, and given who made the comment, it does make sense. Then I choked on my own tongue as the security weenie in me started in on the throttling process, reminding me that while this is fantastic from an autonomics perspective, it's missing some serious input variables.

Latency of the "network" and VM spin-up aside, the dirty little secret is that what's being described here is a realistic and necessary component of real time (or adaptive) infrastructure.  We need to get ultimately to the point where within context, we have the ability to do this, but I want to remind folks that availability is only one leg of the stool.  We've got the other nasty bits to concern ourselves with, too.

Yes, yes we do have other nasty bits to concern ourselves with, and they aren't all security related, though Hoff is, necessarily, more concerned with the security aspects than those of the application.

The concept is sound, I agree, and the dynamism of the network described is absolutely essential in a dynamic, virtual environment. But re-balancing the network based on traffic or network-oriented variables like latency misses some very important application-specific parameters.

Optimizing network traffic is a good thing, but not when that optimization might move nodes (virtual servers) to servers which will negatively affect the performance of the application while improving network-oriented performance. Sure, moving a virtual image of an application from one node to another may decrease hops per flow and associated latency, but the assumption is that all compute resources (hardware) is created equal and that the resource is capable of providing the same level of application performance as its previous hardware. It also assumes that the performance of the application being moved (because that's what's it all about) will not be adversely affected by a move to a compute resource which is already limited in its capacity by the fact that it's serving up other applications.

Hoff gets it right when he mentions context, as there are many more variables to application performance than just network optimizations such as number of hops and bandwidth. There's also this little matter of server affinity that isn't well addressed by moving virtual servers around while they're serving up applications because server affinity (persistence) needs to be handled by an application-aware infrastructure, of which routers and switches are not generally not.

There's also the need to consider the client side of the equation, as the context associated with a client (browser? iPhone? Blackberry? Laptop? DSL? LAN? Wireless? Contractor? Employee? Customer?) is equally important to determining how best an application should be delivered. These are contextual parameters, and not necessarily related to the network in the sense that they can be deduced from network-level parameters. Ignoring these factors can result in an application that performs poorly or outright doesn't function as expected.

Network-based load balancing doesn't adequately address the scalability issues of an application; it addresses the performance of the network. And while that's one piece of the overall application performance equation, it doesn't begin to address more relevant factors that affect the performance of an application. It's like choosing the line at the bank that appears to be moving the fastest at the time you choose it. You can't know what transactions the people ahead of you are about to perform, so your choice is based on parameters that tell only part of the story and, as we often lament (at least I do) the line we choose is often "the wrong one" because we don't have all the information we need to choose the line that will perform the best for us.

The ability to dynamically adjust the network is, I agree, absolutely necessary. But the execution of adjustment has to begin at the top, at the application. The application (or management/monitoring solution responsible for overseeing the dynamic data center) needs to be the authoritative source for when it should or should not move, not the network. A solution with a holistic view of the network, the application delivery infrastructure, and the application needs to determine when an application could benefit from a move, and then collaborate with the underlying infrastructure to determine how best to move that application.

Follow me on Twitter View Lori's profile on SlideShare friendfeedicon_facebook AddThis Feed Button Bookmark and Share