One of the benefits of Infrastructure 2.0 is connectedness: the ability to collect and share pertinent data regarding the health and performance of applications and infrastructure services. Based on that data a dynamic infrastructure can adapt on-demand and make decisions that respect real capacity limits, not artificial ones.

Randy Hayes writes “The CapCal Blog”, and describes CapCal as being about “measuring the performance and scalability of web apps using real, production level workloads.” In A Very Delicate Load Balancing Act he discusses the impact of load balancing configurations on the capacity and performance of applications.

Everyone knows what a load balancer is but exactly what it does and how it goes about doing it are often mysterious. Since every single page request goes through the load balancer, how it is configured and what its capacity is can have everything to do with how well an application performs under load. For example, the maximum number of connections is a configurable parameter that is often very low in its default setting.

Randy goes on to discuss how the artificial connection limits on the load balancer in his scenario negatively affected the ability of the application to scale and perform up to expectations, and conversely how increasing that limit improved performance. This makes sense; the load balancer/application delivery controller is often the “first point of contact” for the user as it is the device – virtual or hardware – that brokers requests between client and server.

But it’s not just a matter of artificial limits on the load balancer itself, it’s a matter of artificial connection limits throughout the load balancer’s configuration. You can, for example, limit the total number of connections available to the “virtual network server”, i.e. the public facing server, as well as limiting on a per-node basis, i.e. server or application instance. So if you’re in a situation such as described by Randy, you may have to tweak multiple configuration settings across the load balancer/application delivery controller in order to increase connection capacity across the entire system.

Or you could just trust the system and let it determine real-time capacity instead.

While the option to establish hard upper limits on performance-related configurations with application delivery controllers for quite some time now (years and years) you’ve also had the ability to establish no limitations on the configuration and simply let the application delivery controller determine limitations on a per application-instance on-demand.

Because application requests are not all equal, at least in the compute resources needed by an application to respond to a request, it is not impossible but it is very hard to set any kind of limitation on connections to a given “node”. Consider that an application has two kinds of requests: “Request A” is very light in terms of compute resources but “Request B” is very heavy, requiring twice as much compute resources as “Request A” to fulfill. Depending on the load balancing algorithm in use, a given node could end up processing only requests of type “A” for an hour, and its capacity to handle those requests is very much affected by the speed with which the application can respond.

Let’s say it can handle 500 “Request A” requests in an hour. But the same node can only handle 250 “Request B” requests in an hour because it takes twice as much computing power to fulfill that request. Let’s further assume that there are at least two “nodes” in this scenario because otherwise you might not consider using a load balancer/application delivery controller in the first place, and both “nodes” are equally capable (let’s pretend they’re Amazon EC2 provisioned instances that are exactly the same in terms of CPU and RAM available).

Both nodes need to be able to participate in the process; that is to say both need to be able to respond to requests. The typical modus imageoperandi in this scenario to ensure a server isn’t overwhelmed by requests is likely to artificially limit the total number of connections that can be made to each server, but you have to limit by the least common denominator in order to enforce capacity constraints. In other words, because both servers could potentially answer queries for Request B you have to limit both nodes to the minimum request processing capability: 250. This is not a problem. Application delivery controllers are more than capable of enforcing “hard limits” on the number of connections per node as well as limiting the number of incoming connections that can be made to any given virtual network server. But doing so is a waste of resources because the requests are going to be mixed and any given node can handle twice as many Request A requests.

Application delivery controllers today are flexible; they’re able to adapt on-demand to myriad situations given the opportunity and they allow for you to determine how best to deal with this situation according to your application performance and your budget.  

For example, you could decide that you’re going to provision (whether in the cloud or in your own data center) two unequal servers and that one will be designated for Request A requests and the other for Request B requests. Then you can use network-side scripting or the native application switching capabilities of an application delivery controller to direct requests on-demand based on the type of request. Or you could use a more advanced load balancing algorithm that takes into account not only the number of connections currently open on each node (least connections) but the available resources (CPU, RAM) on each server to determine how best to route any given request.


The real capacity of any given application is highly dependent and variable. Today it may be X, tomorrow it may be X+1, the day after it may be X-1. That’s because of the variance in request types, volume of requests, and the conditions of the network and operational environment. As long as these parameters are volatile, there is no absolute method of determining the true capacity of a given application instance except through intelligent integration that effectively creates a 360 degree feedback loop across the infrastructure.

All too often what happens is that artificial limits based on worst-case scenarios are placed upon the infrastructure to ensure capacity and performance. This wastes resources, not only on web and application servers but on their supporting infrastructure. But when the infrastructure is dynamic, when it’s based on the intelligence inherent in Infrastructure 2.0 capable solutions, then these limitations can be removed and the resources available to applications leveraged to their full potential.

Follow me on TwitterView Lori's profile on SlideShare friendfeedicon_facebook AddThis Feed Button Bookmark and Share