A lack of ability in the cloud to distinguish illegitimate from legitimate requests could lead to unanticipated costs in the wake of an attack. How do you put a price on uptime and more importantly, who should pay for it?

A “Perfect Cloud”, in my opinion, would be one in which the cloud provider’s infrastructure intelligently manages availability and performance such that when it’s necessary new instances of an application are launched to ensure meeting the customer’s defined performance and availability thresholds. You know, on-demand scalability that requires no manual intervention. It just “happens” the way it should.

Several providers have all the components necessary to achieve a “perfect cloud” implementation, though at the nonce it may require that customers specifically subscribe to one or more services necessary. For example, if you combine Amazon EC2 with Amazon ELB, Cloud Watch, and Auto Scaling, you’ve pretty much got the components necessary for a perfect cloud environment: automated scalability based on real-time performance and availability of your EC2 deployed application.

Cool, right?

Absolutely. Except when something nasty happens and your application automatically scales itself up to serve…no one.


AUTOMATIC REACTIONS CAN BE GOOD – AND BAD

BitBucket’s recent experience with DDoS shows that no security infrastructure is perfect; there’s always a chance that something will sneak by the layers of defense put into place by IT whether that’s in the local data center or in a cloud environment. The difference is in how the infrastructure reacts, and what it costs the customer.

Now, a DDoS such as the one that apparently targeted BitBucket was a UDP-based attack, meaning it was designed to flood the network and infrastructure and not the application. It was trying to interrupt service by chewing up bandwidth and resources on the infrastructure. Other types of DDoS, like a Layer 7 DDoS, specifically attack the application, which could potentially consume its resources which in turn triggers the automatic scaling processes which could result in a whole lot of money being thrown out the nearest window.

Consider the scenario:

  1. An application is deployed in the cloud. The cloud is configured to automatically scale up (launch additional instances) based on response time thresholds.
  2. A Layer 7 DDoS is launched against the application. Layer 7 DDoS is difficult to detect and prevent, and without the proper infrastructure in place it is unlikely to be detected by the infrastructure and even less likely to be detected by the application.
  3. The DDoS consumes all the resources on the application instance, degrading response time, so the infrastructure launches a second instance, and requests are load balanced across both application instances.
  4. The DDoS attack now automatically targets two application instances, and continues to consume resources until the infrastructure detects degradation beyond specified thresholds and automatically triggers the launch of another instance.
  5. Wash. Rinse. Repeat.

How many instances would need to be launched before it was noticed by a human being and it was realized that the “users” were really miscreants?

More importantly for the customer, how much would such an attack cost them?


THIS SOUNDS LIKE A JOB FOR CONTEXTUALLY-AWARE INFRASTRUCTURE

The reason the perfect cloud is potentially a danger to the customer’s budget is that it currently lacks the context necessary to distinguish good requests from bad requests. Cloud today, and most environments if we’re honest, lack the ability to examine requests in the context of the big picture. That is, it doesn’t look at a single request as part of a larger set of requests, it treats each one individually as a unique request requiring service by an application.

Without the awareness of the context in which such requests are made, the cloud infrastructure is incapable of detecting and preventing attacks that could context potentially lead to customer’s incurring costs well beyond what they expected to incur. The cost of an attack in the local data center might be a loss of availability, an application might crash and require the poor guy on call to come in and deal with the situation, but in terms of monetary costs it is virtually “free” to the organization, excepting the potential loss of revenue from customers unable to buy widgets who refuse to return later.

But in the cloud, this lack of context could be financially devastating. An attack moves at the speed of the Internet, and a perfect cloud is hopefully designed to react just as quickly. Just how many instances would be launch – incurring costs to the customer – before such an attack was detected? For all the monitoring offered by providers today it’s not clear whether any of them can discern and attack scenario from a seasonal rush of traffic, and it’s further not clear what the infrastructure would do about it if it could.

And once we add in the concept of intercloud, this situation could get downright ugly. The premise is that if an application is unavailable at cloud provider X according to the customer’s defined thresholds, that requests would be directed to another instance of the application in another cloud, and maybe even a third cloud. How many cloud deployed versions of an application could potentially be affected by a single, well-executed attack? The costs and reach of such a scenario boggle the mind.

My definition of a perfect cloud, methinks, needs to be adjusted slightly. A perfect cloud, therefore, in addition to its ability to automatically scale an application to meet demand must also be able to discern between illegitimate and legitimate users and provide the means by which illegitimate requests are ignored while legitimate requests are processed and only scaling when legitimate volumes of requests require such.


PUTTING A PRICE ON UPTIME
The question I think many people have, I know I certainly do, is who pays for the resulting cost of such an attack?

It’s often been said that it’s difficult if not impossible to put a price on downtime, but what about uptime? What about the cost incurred by the launch of additional instances of an application in the face of an attack? An attack that cannot be reasonably detected by an application? An attack that is clearly the responsibility of the infrastructure to detect and prevent; the infrastructure over which the customer, by definition and design, has no control?

Who should pay for that? The customer, as a price of deploying applications in the cloud, or the provider, as a penalty for failing to provide a robust enough infrastructure to prevent it?

Follow me on Twitter    View Lori's profile on SlideShare  friendfeed icon_facebook

AddThis Feed Button Bookmark and Share