You may have heard the term “full-proxy architecture” or “dual stacks” thrown around in the context of infrastructure; here’s why that distinction is important. 

rock-stack When the terms “acceleration” and “optimization” in relation to application delivery are used it often evokes images of compression, caching, and similar technologies. Sometimes it even brings up a discussion on protocol optimization, which is really where things get interesting. 

You see, caching and compression techniques are mostly about the content – the data – being transferred. Whether it’s making it smaller (and thus faster) or delivering it from somewhere closer to the user (which also makes it faster) the focus of these acceleration techniques is really the content. But many of the most beneficial optimizations happen below the application data layer, at the transport level and below.

It’s all about the stack, baby. A good one is fast, a bad one, well, isn’t.

But it isn’t just about an optimized stack. Face it, there are like a gazillion different tricks, tips, and cheats for optimizing the stack on every operating system but all of them are peculiar to a specific operating environment. Which is great if you’re an end-user trying to trick out your desktop to download that ginormous file even faster. When it’s not so great is when you’re a web server or a piece of application delivery infrastructure.

ONE SIZE DOES not FIT ALL

So here’s the thing – when you tweak out a single-stack piece of infrastructure for a specific environment you’re necessarily ignoring every other environment. You have to pick and choose what set of optimizations you’re going to use, and you’re stuck with it. If eighty percent of your user-base is accessing an application over “link A” then the other twenty percent are probably going to experience poor performance – and you’ll be lucky if they don’t experience time-image outs or resets as well.

This problem (which has been solved by full-proxy, dynamic dual-stack infrastructure for a long time) has reared its ugly head yet again recently with the excitement over virtual network appliances (VNA). You know, a virtual image of your infrastructure components, deployed in the same flexible, rapid manner as your applications. The problem with this is that just slapping a network component into a virtual image results in a less than optimal integration. The component leverages the networking stack of the hypervisor necessarily, which means it is optimized to communicate over a LAN. A low latency, high-throughput, high capacity network connection without a lot of congestion. You know, the kinds of things that make WAN-delivered applications slow, sluggish, and unresponsive.

For the same reasons that a web/application server – regardless of form-factor – can’t be optimized for both LAN and WAN at the same time neither can a VNA. It has a single-stack because that’s what’s underlying the entire system and what’s being interfaced with. It cannot simultaneously address pain points with WAN connected communications and LAN connected communications.

So not only are you incapable with a single-stack infrastructure of optimizing and accelerating on a per-connection basis, when you deploy an infrastructure component in virtualized form (or any other form that results in a single network stack architecture)  you are now incapable of optimizing and accelerating on a per network connection basis. It’s LAN or WAN, baby. Those are your choices.

TRANSLATORS and TRAFFIC COPS

An intermediary is defined as a “mediator: a negotiator who acts as a link between parties”. The analogy of a “translator” is often used to describe the technical functionality of an intermediary, and it’s a good one as long as one remembers that a translator actually does some work – they translate one language to another. They terminate the conversation with one person and initiate and manage conversations with another simultaneously. They are “dual” stacked, if you will, and necessarily must be in order to perform the process of translation.

This is in stark contrast to previous analogies where load balancers and other application delivery focused infrastructure were analogized as “traffic cops.” Traffic cops, when directing traffic, do not interact or otherwise interrupt the flow of traffic very often. They are not endpoints, they are not involved in the conversation except to point out where and when cars are allowed to go. They do not interact with the traffic in the way that a translator does. In fact they use nearly universal hand signals to direct traffic (think transport protocol layer and below) because they are primarily concerned with speed and performance. Their job is to get that car (packet) moving in the right direction and get it out of the way. They don’t care where its going or what its going to do there; traffic cops only care about making sure the car (packet) is on its way.

Translators, intermediaries, care about what is being said and they are adept at ensuring that the conversation is managed properly. Speed and performance are important, but making sure the conversation is accurate and translated correctly is as important to the translator as doing so quickly.

Traffic cops are single-stacks; translators are dual-stacks.

DIALECTS and DIFFERENCES

When you have exactly the same connection type on both sides of the conversation, a traffic cop is okay. But this is almost never the case, because even when two clients access an application over the generic “WAN”, there are still variances in speed, latency, and client capabilities. Sure, they’re both speaking Chinese, image but they’re both speaking different dialects of Chinese that each have their own nuances and idioms and especial pronunciation that requires just a bit different handling by the infrastructure. Optimizing and accelerating those connections requires careful attention to each individual conversation, and may further require tweaks and tuning on-demand for that specific conversation over and above the generic WAN-focused tweaks and tuning performed to enhance WAN communication.

A dual-stack infrastructure component is an intermediary. It can perform the function of a traffic-cop if that’s all you need but it is almost certainly the case that you need more, because users and partners and integrated applications are accessing your applications from a variety of client-types and a broad set of network connections. Dual-stack infrastructure separates, completely, the client communication from the server-communication, and enables the application and enforcement of policies that enhance security, performance, and availability by adapting in real-time to the conditions that exist peculiar to the client and the application.

Single-stack infrastructure simply cannot adapt to the volatile environment of today’s modern deployment architectures, e.g. cloud computing , highly virtualized, multi-site, and highly distributed. Single-stack infrastructure – whether network or server – are unable to properly optimize that single network stack in a way that can simultaneously serve up applications over WAN and LAN, and do so for both mobile and desktop clients such that both are happy with the performance.

Consider the queues on your web server – that’s where data collects on a per-connection basis, waiting to be transferred to the client. There are only so many queues that can be in use at any given time – it’s part of the capacity equation. The ability of clients to “pull” that data out of queues is directly related to the speed and capacity of their network connection and on the configuration and resources available of their client. If they pull it too slowly, that queue is tied up and resources assigned to it can’t be used by other waiting users. Slow moving queues necessarily decrease the concurrent user and connection capacity of a server (virtual or iron) and the result necessitates more hardware and servers as a means to increase capacity. A single-stack infrastructure really can’t address this common problem well. A dual-stack infrastructure can, by leveraging its buffering capacity to quickly empty those queues and re-use the resources for other connections and users. In the meantime, its doling out the data to the client as quickly or slowly as the client can consume it, with negligible impact on the infrastructure’s resource availability.

Dual-stack infrastructure can be tweaked and tuned and adapts at execution time. It’s agile in its ability to integrate and collaborate with the rest of the infrastructure as well as its ability to apply the right policies at the right time based on conditions that are present right now as opposed to when it was first deployed. It can be a strategic point of control because it intercepts, inspects, and can act based on context to secure and accelerate applications.