posted on Thursday, February 18, 2010 3:47 AM
Surprised? I was, but I shouldn’t have been.
While working on other topics I ran across an interesting slide in a presentation given by Microsoft at TechEd Europe 2009 on virtualization and Exchange. Specifically the presenter called out the average 12% overhead incurred from the hypervisor on systems in internal testing. Intuitively it seems obvious that a hypervisor will incur overhead; it is, after all, an application that is executing and thus requires CPU, I/O, and RAM to perform its tasks. That led to me to wonder if there was more data on the overhead from other virtualization vendors.
I ended up reading an enlightening white paper from VMware on consolidation of web applications and virtualization in which it observes that multi-virtual configurations actually outperformed in terms of capacity and performance a server configured with a similar number of CPUs. Note that this is specifically for web applications, though I suspect that any TCP-heavy application would likely exhibit similar performance characteristics.
Although virtualization overhead varies depending on the workload, the observed 16 percent performance degradation is an expected result when running the highly I/O‐intensive SPECweb2005 workload. But when we added the second processor, the performance difference between the two‐CPU native configuration and the virtual configuration that consisted of two virtual machines running in parallel quickly diminished to 9 percent. As we further increased the number of processors, the configuration using multiple virtual machines did not exhibit the scalability bottlenecks observed on the single native node, and the cumulative performance of the configuration with multiple virtual machines well exceeded the performance of a single native node.
-- “Consolidating Web Applications Using VMware Infrastructure” [PDF, VMware]
We know there’s overhead associated with the hypervisor. Fact. But what’s interesting here is that the overhead turns out to be irrelevant – at least in the case of web applications. What’s important is the initial degradation of performance and its subsequent improvement as additional virtual instances are added. We need to understand why that’s the case, because it has – or should have – an impact on our overall architectural strategy.
SCALE OUT VIRTUALLY for BEST RESULTS
So why would multi-instances of a web server – virtual no less - scale better for performance than simply scaling out, i.e. adding more CPUs? If we look at typical performance patterns from really any TCP-connection oriented device or application, we see very similar behavior. Capacity of the device or application tends to have a steep growth curve that plateaus rather quickly and then remains somewhat constant. The associated performance pattern of such devices and applications tends to begin with very low latency and good response times, but gradually increases as capacity plateaus at or near capacity.
This pattern, with few aberrations, should be fairly recognizable to anyone who’s performed any kind of load or performance testing on a connection-oriented (TCP-based) solution. In fact, an obvious deviation from the pattern often indicates some sort of problem in the network or solution that needs to be addressed. Garbage collection processes in JavaEE application servers, for example, have traditionally been seen as regular inverse spikes in the overall number of TCP connections and CPU utilization on the host server coupled with an increase in response time as the CPU is completely utilized for a matter of microseconds while the process completes. The reason this is consistent across connection-oriented devices and applications is because they are connection oriented. Connections must be tracked, i.e. stored in memory, and subsequently accessed as messages flow across the connection. This requires RAM and, in some cases, I/O resources. As the number of connections grows, the “table” in which they are stored grows, thus increasing the amount of time necessary to “find” the connection as well as the associated resources. Too, the more connections the more serialization and locking that occurs and it is the serialization that is another primary bottleneck for the web server.
Hence, the more connections made to a given solution, the more its performance tends to degrade. 
Virtualization appears to actually address this issue by limiting connection capacity by limiting available resources. On the other hand, adding more CPU and RAM will lead to higher connection capacity and thus larger connection tables which leads to a higher degradation in performance due to the increase in serialization. Rather than simply adding CPUs it would be, from a performance standpoint, probably a better option to add another virtual instance – and another, as CPUs increase – to maintain consistent capacity and a predictable performance pattern.
You need to scale up the hardware capacity, but should scale out at the virtual and application layers to optimize efficiency of the resources and maintain the end-user experience. By load balancing across multiple, smaller, homogeneous server instances you also make capacity planning much simpler because you know exactly what the capacity for a given instance will be and can use that information to prepare in advance a plan for increasing capacity on-demand. Scaling up does not offer the same consistency because capacity will be highly dependent upon the CPU and RAM provisioned as well as load.
OPTIMAL STRATEGY for ADDRESSING SCALABILITY
When it comes time to scale an application, keep in mind that the decision to scale out or up has a direct impact on your ability to perform capacity planning and on performance. Predictable capacity with predictable performance is optimal as a baseline, and thus what’s required is a strategy that employs homogeneous (in terms of capacity) virtual servers as well as load balancing. As if you’re getting a bonus, it’s good to note that for optimizing operational costs associated a scaling out strategy, scaling out based on smaller, focused virtual servers will likely afford you the best return on investment you’ll get.
IT operations has to first trust the ability of cloud computing models to scale up, on-demand, as per the literature. In order to maximize the benefits of cloud computing IT actually has to provision resources based on the lowest common denominator rather than trying to provision for highest peak demand, which runs contrary to everything IT operations knows as truth about provisioning a data center to ensure availability of applications around the clock.
-- To Take Advantage of Cloud Computing You Must Unlearn, Luke.
Even if you’re using only one physical server, you’ll probably want to employ a smaller, homogeneous virtualized approach to scaling out web applications. Test your application until you find the apex of its performance and capture then the CPU and RAM required at that point. Use these values to standardize your web application virtual machine specifications. Evaluate your current infrastructure, too, and determine if there are performance and efficiency tuning enhancements you can make to configurations, such as simply changing the load balancing algorithm on your Load balancer/application delivery controller.
Virtualization apparently has, in addition to making the life of system administrators a whole lot easier, shall we say some hidden benefits for web applications that make combining a strong application delivery strategy and architecture with virtualization a win-win for users and administrators alike. But just as you’ll need to “unlearn” to really take advantage of cloud computing, you’re probably going to have to “unlearn” to take advantage of virtualization in your own data center, too.
Technorati Tags:
MacVittie,
F5,
virtualization,
TCP,
web applications,
VMware,
Microsoft,
performance,
capacity,
load balancing,
scalability