A load balancing algorithm can make or break your application’s performance and availability

It is a (wrong) belief that “users” of cloud computing and before that “users” of corporate data center infrastructure didn’t need to understand any of that infrastructure. Caution: proceed with infrastructure ignorance at the (very real) risk of your application’s performance and availability. Think I’m kidding? Stefan’s SOA & Enterprise Architecture Blog has a detailed and very explanatory post on Load Balancing Strategies for SOA Infrastructures that may change your  mind. 

Confused This post grew, apparently, out of some (perceived) bad behavior on the part of a load balancer in a SOA infrastructure. Specifically, the load balancer configuration was overwhelming the very services it was supposed to be load balancing. Before we completely blame the load balancer, Stefan goes on to explain that the root of the problem lay in the load balancing algorithm used to distribute requests across the services. Specifically, the load balancer was configured to use a static round robin algorithm and to apply source IP address-based affinity (persistence) while doing so. The result is that one instance of the service was constantly sent requests while the others remained idle and available. Stefan explains how the load balancing algorithm was changed to utilize a dynamic ratio algorithm that takes into consideration the state of each service (CPU and memory available) and removed the server affinity requirement.

The problem wasn’t the load balancer, per se. The load balancer was acting exactly as it was configured to act. The problem lay deeper: in understanding the interaction between the network, the application network, and the services themselves. Services, particularly stateless services as offered by SOA and REST-based APIs today, do not generally require persistence. In cases where they do require persistence, that persistence needs to be based on application-layer information, such as an API key or user (usually available in a cookie).

But this problem isn’t unique to SOA. Consider, if you will, the effect that such an unaware distribution might have on any one of the popular social networking sites offering RESTful APIs for integration. Imagine that all Twitter API requests ended up distributed to one server in Twitter’s infrastructure. It would fall over quickly, no doubt about that, because the requests are distributed without any consideration for current load and almost, one could say, blindly.

Stefan points this out as he continues to examine the effect of load balancing algorithms on his SOA infrastructure:

“Secondly, the static round-robin algorithm does not take in effect, which state each cluster node has. So, for example if one cluster node is heavily under load, because it processes some complex orders, and this results in 100% cpu load, then the load balancer will not recognize this but route lots of other requests to this node causing overload and saturation.”

Load balancing algorithms that do not take into account the current state of the server and application, i.e. they are not context-aware, are not appropriate for today’s dynamic application architectures. Such algorithms are static, brittle, and blind when it comes to distributed load efficiently and will ultimately result in an uneven request load that is likely to drive an application to downtime.

It is imperative in a distributed application architecture like SOA or REST that the application network infrastructure, i.e. the load balancer, be able to take into consideration the current load on any given server before distributing a request. If one node in the (pool|farm|cluster) is processing a complex order that consumes most of the CPU resources available, the load balancer should not continue to send it requests.

This requires that the load balancer, the application delivery controller, be aware of the application, its environment, as collaborationwell as the network and the user. It must be able to make a decision, in real-time, about where to direct any given request based on all the variables available. That includes CPU resources, what the request is, and even who the user/application is.

For example, Twitter uses a system of inbound rate limiting on API calls to help manage the load on its infrastructure. Part of that equation could be the calling application. HTTP as a transport protocol contains a somewhat surprisingly rich array of information in its headers that can be parsed and inspected and made a part of the load balancing equation in any environment. This is particularly useful to sites like Twitter where multiple “applications” (clients) are making use of the API. Twitter can easily require the use of a custom HTTP header that includes the application name and utilize that as part of its decision making processes.

Like RESTful APIs, SOAP envelopes are full of application specifics that provide data to the load balancer, if it’s context-aware, that can be utilized to determine how best to distribute a request. The name of the operation being invoked, for example, can be used to not only load balance at the service level, but at the operation level. That granularity can be important when operations vary in their consumption of resources.

This application layer information, in conjunction with current load and connections on the server provide a wealth of information as to how best, i.e. most efficiently, to distribute any given request. But if the folks in charge of configuring the load balancer aren’t aware of the impact of algorithms on the application and its infrastructure, you can end up in a situation much like that described in Stefan’s blog on the subject.

Cloud computing won’t solve this problem and, in fact, it will probably make it worse. The belief that the infrastructure should be “hidden” from the user (that’s you) means that configuration options – like the load balancing algorithm – aren’t available to you as a user/deployer of cloud-based applications. Even though load balancing is going to be used to scale your application, you have no clue or control over how that’s going to occur.

That’s why it’s important that you ask questions of your provider on this subject. You need to know what algorithm is being used and how requests are distributed so you can determine how that’s going to impact your application and its performance once its deployed. You can’t – or shouldn’t – assume that the load balancing provided is going to magically distribute requests perfectly across your scaled application because it wasn’t configured with your application in mind.

If you deploy an application – particularly a SOA or RESTful one – you may find that with scalability comes poor performance or even unavailable applications because of the configuration of that infrastructure you “aren’t supposed to worry about.”

Applications are not islands; they aren’t deployed stand-alone even though the virtualization of applications is making it seem like that’s the case. The delivery of applications requires collaboration between a growing number of components in the data center and load balancing is one of the key components that can make or break your application’s performance and availability.

Follow me on Twitter View Lori's profile on SlideShare friendfeedicon_facebook AddThis Feed Button Bookmark and Share