When it comes to availability, coding a solution is just delaying the inevitable

Jonathan Howell, in Five Things That Will Kill Your Site – an excellent read, by the way, for all web application developers – asserts that there are several ways to avoid web application death that do not require the implementation of “expensive redundant hardware with top of the line load balancers and an enterprise class SAN.” In general he’s got some good advice to which application developers should pay attention, but I had to disagree with his assertion that a solution to provide graceful degradation of performance requires coding. In fact, I’d say this is a not a good choice in design as it results in the tight-coupling of potentially volatile parameters with the application, which means if the environment or capacity changes, so too must the application.

Degrade gracefully. It is better to give full service to a percentage of your users, and a helpful static message to the rest, than for your entire site to be entirely unresponsive. You’ll need to specifically code for this.

In general a production web application will be constrained on the number of requests that can be effectively processed at any one time – and you should have a good idea of what the upper limit is from following the “know you capacity” advice above.

For a fixed number of incoming requests per second, the number of requests in progress at any one time depends on how long they take to process: 10 requests a second that take 1 second each to process means that you will have 10 requests in progress at any one time. If it starts to take 2 seconds to process each, then you will have 20 requests in progress at any one time, which will probably make your requests take slightly longer still. This slowdown continues until you reach a tipping point where your service grinds to a halt, and any responses that do get returned take much longer than the browser timeout.

The basic logic is right on the money, of course, and what is being suggested is essentially inbound rate limiting. The solution will work, to a point, but the complexity involved in an application being able to extract the total number of requests from within a single, bounded request, is difficult to do and still consumes resources that will impact performance. Web applications implementing internal rate limiting solutions such as suggested will still end up degrading, and perhaps not as “gracefully” as it is suggested. In fact, such an approach is likely to result in what he describes subsequently as “Slow death”, i.e. the gradual consumption of resources over time that leads to availability issues, as the author points out.


The first problem is that when an application is executing the logic required to handle a single request it doesn’t have the context necessary to reach out and grab information about other requests. You simply don’t have the information necessary to be able to make an informed decision about whether a request should be answered now or later. The fact is that execution time in a production environment is dynamic, and the formula suggested above for determining capacity and execution times assumes a static environment. In reality, there are so many other factors that go into the response time equation that merely assuming a static time for execution of application logic and basing capacity constraints upon that formula is going to end up backfiring on the developer. Certain requests, i.e. those requiring back-end data source access, are going to necessarily take longer to execute than those that do not. Requests that may make call outs to other services, internal or external, also take varying amounts of time; time that includes variables such as how bogged down they are with requests. As resources become limited execution time increases. A simple X seconds per request may be acceptable for determining an average request execution time, but that’s theory. In practice, it’s just not going to cut it.

But let’s assume for a moment that you’re able to do this and you code up a solution either using some interesting code tricks or by developing a shim/handler that’s inserted higher in the call chain on an application server. Totally doable – this is how many agent-based monitoring and management systems implement functionality deployed on such servers.

Now what needs to be considered is that it isn’t just the execution of requests that consumes resources. Merely opening and maintaining a TCP Connection_failed connection – which is required in the code-based solution – also consumes resources, specifically memory. With a code-based solution for graceful degradation you have to open a connection to the server first. All those connections must be maintained in what we network-focused people like to call a session-table, and that requires memory. The more connections you open up – whether the application processes the request immediately or not – require an entry in the session-table until the request has been processed. Eventually the application is not going to respond at all; not even to a “percentage  of your users.” All available resources will have been consumed, dedicated to just handling TCP connections, and no users will be able to connect at all. They’ll just get a “timed out” error and not the “helpful static message” you carefully prepared for them.

There is no way to avoid this using a code-based solution deployed on the same server as the application. It’s inevitable that you will run out of resources because you can’t avoid opening a connection, which is going to consume resources. There is a limit to how far you can degrade gracefully and eventually you’ll hit it, regardless of how carefully you’ve written the code. Which leaves you right back at square one: a completely unresponsive site. Isn’t that what you were trying to avoid in the first place? 


The ability to rate limit inbound requests based on response time and capacity is one of the primary roles of application delivery services. I realize that the article referenced was trying to show application developers how to implement certain availability-related functions without investing in application delivery and data services or load balancers, but the reality is that “degrading gracefully” isn’t one of the functions that can be reliably implemented without an external solution dedicated to providing that functionality – a solution with a much higher capacity for handling TCP connections than a web or application server.

Furthermore, application delivery services are context-aware; they have the ability to look at both rate of incoming requests and capacity of applications and make determinations dynamically as to whether or not a specific request will degrade performance or not. Application delivery services are able to see “the big picture” because they’re designed from the ground up to monitor variables like total number of connections to any given application and response times on a per application basis. All the mathematics that goes into trying to figure out how long it will take any given request to execute and then trying to determine capacity from that is unnecessary with an application delivery service because it can be computed dynamically. That results in a much more efficient use of resources than a statically defined formula that can’t account for environmental conditions that may affect response time and thus capacity.

Application delivery services are capable of maintaining many times the number of connections to clients while carefully distributing requests to applications in a way that maintains performance without over-subscribing any given application instance. Its ability to offload connection management from applications actually increases performance and capacity of applications, which results in no real need to worry about implementing inbound rate limiting in the application in the first place.

Maybe if you’re just writing an application for fun you can get away with using a code-based solution for availability and graceful degradation of service. But if users rely on your application, or pay for that service, or you’re hoping you’re the next Twitter or Facebook, then you should not rely on a solution that is inherently unable to provide graceful degradation of service because it simply won’t work reliably in the long run.

There are plenty of functions that should, always, be implemented in the application. But there are just as many functions that should not be implemented in an application and should instead by offloaded to application delivery services. Rate limiting and pure availability services are of the latter kind.


Follow me on Twitter View Lori's profile on SlideShare friendfeedicon_facebook AddThis Feed Button Bookmark and Share