Are you scaling applications or servers?

 vaderAuto-scaling cloud brokerages appear to be popping up left and right. Following in the footsteps of folks like RightScale, these startups provide automated monitoring and scalability services for cloud computing customers. That’s all well and good because the flexibility and control over scalability in many cloud computing environments is, shall we say, somewhat lacking the mechanisms necessary to efficiently make use of the “elastic scalability” offered by cloud computing providers.

The problem is (and you knew there was a problem, didn’t you?) that most of these companies are still scaling servers, not applications. Their offerings still focus on minutia that’s germane to server-based architectures, not application-centric architectures like cloud computing. For example, the following statement was found on one of the aforementioned auto-scaling cloud brokerage service startups:

"Company X monitors basic statistics like CPU usage, memory usage or load average on every instance."

Wrong. Wrong. More Wrong. Right, at least, in that it very clearly confesses these are basic statistics, but wrong in the sense that it’s just not the right combination of statistics necessary to efficiently scale an application. It’s server / virtual machine layer monitoring, not application monitoring. This method of determining capacity results in scalability based on virtual machine capacity rather than application capacity, which is a whole different ball of wax. Application behavior and complexity of logic cannot be necessarily be tied directly to memory or CPU usage (and certainly not to load average). Application capacity is about concurrent users and response time, and the intimate relationship between the two. An application may be overwhelmed while not utilizing a majority of its container’s allocated memory and CPU. An application may exhibit increasingly poor performance (response times) as the number of concurrent connections increase toward its limits while still maintaining less than 70,80, or 90% of its CPU resources. An application may exhibit slower response time or lower connection capacity due to the latency incurred by exorbitantly computational expensive application logic. An application may have all the CPU and memory resources it needs but still be unable to perform and scale properly due to the limitations imposed by database or service dependencies.

There are so many variables involved in the definition of “application capacity” that it sometimes boggles the mind. One thing is for certain: CPU and memory usage is not accurate enough indicator of the need to scale out an application.