posted on Monday, February 15, 2010 4:06 AM
Or more apropos, it’s in the complex and intimate relationship between applications and their infrastructure.
What’s the difference between a highly virtualized corporate data center and a cloud computing environment? There are probably many, but the most important distinction – and the one that earns the latter a “cloud computing” tag – is certainly that the former lacks a comprehensive orchestration system and was likely not architected using a rapid, infrastructure inclusive, scalability strategy.
Mitch Garnaat, “The Elastician”, recently managed to sum up what should be every modern data center’s motto in a single, six word tweet: “Scale is not just about servers.” In fact, I am hereby dubbing this “Garnaat’s Theorem”, as this fundamental data center truth has been established and proven more times than some of us would care to remember.
Mitch could not be more right. Scale is not just about servers, and for corporate data centers and cloud computing providers looking to realize the benefits of rapid elasticity and on-demand provisioning scale simply must be one of the foundational premises upon which a dynamic data center is built. And that includes the infrastructure.
SCALE of SERVERS has a RIPPLE EFFECT throughout the DATA CENTER
It’s a fine thing to be able to replicate and deploy many instances of an application to achieve higher capacity, better performance, and fault tolerance. But if the underlying infrastructure is not also prepared to handle the higher volume of traffic and requests, such a strategy will ultimately fail. Large scale data centers already know this; the network and application network infrastructure over which high volume web applications are delivered must also scale to meet application demand or risk becoming the bottleneck and cause of poor performance.
BANDWIDTH
Bandwidth seems a no brainer, after all we’re running data centers built on 10Gbps backbones today. But individual NICs on servers and ports on switches and other network infrastructure solutions are often still only capable of handling 1Gbps. A very large server (think 8, 16, or more cores) may have multiple NICs but each may be only 1Gbps. Scaling bandwidth, then, becomes a matter of architecting the network such that more bandwidth is available to the server. Teaming of NICs is often used as a means to combine multiple network connections into a “virtual” network connection that can handle many more Gbps of traffic. But simply teaming NICs or deploying a vSwitch on the server will only help distribute application responses, because the software responsible for binding those NICs together only has control over the server-side traffic. To scale network traffic bidirectionally it’s necessary to also configure the network solution(s) on the other end of that link – might be a switch, might be a load balancing device, etc… – to also bind together the appropriate network connections to form a single, logical link between the server and the network device.
Consider what that means in relation to scalability. Port densities on switches and application network solutions have increased, yes, but in organizations other than the largest it is likely many of those devices do not have more than 48 ports available, i.e. there are only 48 network connections available for connection servers. If each large server is teaming four NICs together, that means four ports on the network infrastructure should also be “teamed” together, which reduces the number of servers able to be physically connected to that switch to 12. If the data center has more than 12 “servers” then it’s likely that additional network infrastructure components will be necessary. More NICs teamed together means fewer servers physically connected, which can dramatically change the physical (and logical) network and application network infrastructure architecture.
“UPSTREAM” SOLUTIONS
Most organizations have other network, security, and application network infrastructure that architecturally sits “in front of” the applications. Each of these components must be capable of scaling in tandem with the servers and applications being scaled. If it is expected that an application can scale from 10,000 concurrent users to 50,000 concurrent users, so must the infrastructure be capable of scaling. Some solutions, such as IPS and IDS in the security world – and even many WAFs (Web Application Firewalls) - are often configured to transparently monitor and subsequently act on incoming and outgoing responses. These solutions require ports on infrastructure switching solutions, and thus if it is necessary to scale them to support higher volumes of traffic they, too, will require additional ports on infrastructure switching solutions whether because network connections are aggregated to provide the bandwidth necessary and mitigate bandwidth limiting bottlenecks or because multiple instances (physical or virtual) of the solution are being deployed. Every solution needs its own network connection,
after all.
NETWORK CONNECTIONS
Speaking of network connections needed by infrastructure solutions, “data center scalability” also requires careful consideration of the management network. Most infrastructure solutions utilize separate management ports as a means to enable out-of-band management. This mitigates the risk of being unable to access and manage a given infrastructure component in the face of overwhelming traffic, such as a DDoS scenario or failed network connection. Whether this is accomplished via a physically or logically separate network is a matter of budget and design, but in either case requires careful attention to the architecture of the data center to allow both the “work” network and the “management” network to scale along with the application.
This also should trigger concern or at least a line item on the scalability to-do list to inquire as to the scalability of network management systems (NMS) used to manage these applications and its infrastructure. More infrastructure, more complex architectures, more applications and servers will certainly add stressors on the NMS that may not have been present before. It may be necessary to scale that solution, as well, which impacts the network and application network infrastructure as well as the applications, depending on the solution used.
VIRTUALIZATION doesn’t CHANGE core IMPACT
When folks start getting excited about the idea of Virtual Network Appliances (VNAs) as a means to achieve some of the scalability challenges I’ve noted above, they are thinking about scalability of infrastructure (which is good) but forget that the VNA is going to incur the same scalability challenges that a virtualized application would incur (which is not so good). The “Spin Strategy”, i.e. spin up a new instance of X to address increasing demand, is not a strategy at all, it’s a tactical measure to addressing a specific task within what should be an architectural strategy to scalability. We’ve barely touched on the challenges associated with infrastructure scalability in this post; there are many other factors to consider. What should be clear, however, is that Garnaat’s Theorem holds true: scale is not just about servers. There are many, many more pieces of the architectural puzzle that go into scalability and it behooves data center architects to consider the impact of scaling approaches – up, out, virtual, physical – before jumping into an implementation.
This isn’t the first time I’ve touched upon this subject, but it’s a concept that needs to be reiterated – especially with so many pundits and analysts looking for the next big virtualization wave to crash onto the infrastructure beachhead. I’m here to tell you, though, that the devil is in the details. The architectural details.