Search
Lori MacVittie - Two Different Socks
You are here: DevCentral > Weblogs

posted on Friday, October 16, 2009 3:15 AM

A lack of ability in the cloud to distinguish illegitimate from legitimate requests could lead to unanticipated costs in the wake of an attack. How do you put a price on uptime and more importantly, who should pay for it?

A “Perfect Cloud”, in my opinion, would be one in which the cloud provider’s infrastructure intelligently manages availability and performance such that when it’s necessary new instances of an application are launched to ensure meeting the customer’s defined performance and availability thresholds. You know, on-demand scalability that requires no manual intervention. It just “happens” the way it should.

Several providers have all the components necessary to achieve a “perfect cloud” implementation, though at the nonce it may require that customers specifically subscribe to one or more services necessary. For example, if you combine Amazon EC2 with Amazon ELB, Cloud Watch, and Auto Scaling, you’ve pretty much got the components necessary for a perfect cloud environment: automated scalability based on real-time performance and availability of your EC2 deployed application.

Cool, right?

Absolutely. Except when something nasty happens and your application automatically scales itself up to serve…no one.


AUTOMATIC REACTIONS CAN BE GOOD – AND BAD

BitBucket’s recent experience with DDoS shows that no security infrastructure is perfect; there’s always a chance that something will sneak by the layers of defense put into place by IT whether that’s in the local data center or in a cloud environment. The difference is in how the infrastructure reacts, and what it costs the customer.

Now, a DDoS such as the one that apparently targeted BitBucket was a UDP-based attack, meaning it was designed to flood the network and infrastructure and not the application. It was trying to interrupt service by chewing up bandwidth and resources on the infrastructure. Other types of DDoS, like a Layer 7 DDoS, specifically attack the application, which could potentially consume its resources which in turn triggers the automatic scaling processes which could result in a whole lot of money being thrown out the nearest window.

Consider the scenario:

  1. An application is deployed in the cloud. The cloud is configured to automatically scale up (launch additional instances) based on response time thresholds.
  2. A Layer 7 DDoS is launched against the application. Layer 7 DDoS is difficult to detect and prevent, and without the proper infrastructure in place it is unlikely to be detected by the infrastructure and even less likely to be detected by the application.
  3. The DDoS consumes all the resources on the application instance, degrading response time, so the infrastructure launches a second instance, and requests are load balanced across both application instances.
  4. The DDoS attack now automatically targets two application instances, and continues to consume resources until the infrastructure detects degradation beyond specified thresholds and automatically triggers the launch of another instance.
  5. Wash. Rinse. Repeat.

How many instances would need to be launched before it was noticed by a human being and it was realized that the “users” were really miscreants?

More importantly for the customer, how much would such an attack cost them?


THIS SOUNDS LIKE A JOB FOR CONTEXTUALLY-AWARE INFRASTRUCTURE

The reason the perfect cloud is potentially a danger to the customer’s budget is that it currently lacks the context necessary to distinguish good requests from bad requests. Cloud today, and most environments if we’re honest, lack the ability to examine requests in the context of the big picture. That is, it doesn’t look at a single request as part of a larger set of requests, it treats each one individually as a unique request requiring service by an application.

Without the awareness of the context in which such requests are made, the cloud infrastructure is incapable of detecting and preventing attacks that could context potentially lead to customer’s incurring costs well beyond what they expected to incur. The cost of an attack in the local data center might be a loss of availability, an application might crash and require the poor guy on call to come in and deal with the situation, but in terms of monetary costs it is virtually “free” to the organization, excepting the potential loss of revenue from customers unable to buy widgets who refuse to return later.

But in the cloud, this lack of context could be financially devastating. An attack moves at the speed of the Internet, and a perfect cloud is hopefully designed to react just as quickly. Just how many instances would be launch – incurring costs to the customer – before such an attack was detected? For all the monitoring offered by providers today it’s not clear whether any of them can discern and attack scenario from a seasonal rush of traffic, and it’s further not clear what the infrastructure would do about it if it could.

And once we add in the concept of intercloud, this situation could get downright ugly. The premise is that if an application is unavailable at cloud provider X according to the customer’s defined thresholds, that requests would be directed to another instance of the application in another cloud, and maybe even a third cloud. How many cloud deployed versions of an application could potentially be affected by a single, well-executed attack? The costs and reach of such a scenario boggle the mind.

My definition of a perfect cloud, methinks, needs to be adjusted slightly. A perfect cloud, therefore, in addition to its ability to automatically scale an application to meet demand must also be able to discern between illegitimate and legitimate users and provide the means by which illegitimate requests are ignored while legitimate requests are processed and only scaling when legitimate volumes of requests require such.


PUTTING A PRICE ON UPTIME
The question I think many people have, I know I certainly do, is who pays for the resulting cost of such an attack?

It’s often been said that it’s difficult if not impossible to put a price on downtime, but what about uptime? What about the cost incurred by the launch of additional instances of an application in the face of an attack? An attack that cannot be reasonably detected by an application? An attack that is clearly the responsibility of the infrastructure to detect and prevent; the infrastructure over which the customer, by definition and design, has no control?

Who should pay for that? The customer, as a price of deploying applications in the cloud, or the provider, as a penalty for failing to provide a robust enough infrastructure to prevent it?

Follow me on Twitter    View Lori's profile on SlideShare  friendfeed icon_facebook

AddThis Feed Button Bookmark and Share

Related blogs & articles:



Feedback

10/16/2009 5:11 AM
Gravatar The party best equipped to deal with a risk should be the party bearing the costs. The provider is the only one able to do anything about it and worse, externalising it to the customer provides an incentive for the provider to turn a blind eye.

Sam
Sam Johnston
10/16/2009 5:20 AM
Gravatar Great post, Lori.

I was looking at moving some web sites to a cloud service and while I found a few that were cost effective from a computing and storage point of view, I had a persistent voice in my head "What about a DDoS attack?" Autoscaling is certainly a big deal, but even without autoscaling, paying metered access for network traffic can be costly depending on how long the DDoS goes for. Autoscaling just makes it worse.
Mike Fratto
10/16/2009 7:56 AM
Gravatar As a Service Provider, I would dread having to explain to the client why instead of their customary $3400 bill for December, they got a $250,000 invoice. Ouch!! Talk about a lump of coal in your stocking ;-)

I think the need for some kind of a governor on how much you are permitted to flex your cloud infrastructure is essential. I frankly rate the probability of a malicious attack as being much lower than a simple coding error in the application or the orchestration software. Many of the orchestration engines have scripting capabilities and event triggers that can be as complex as application code in some circumstances. I have seen far more runaway processes than I have denial of service attacks in my career.

I think an obvious measure to minimize the impact and increase the likelihood of early detection would be for cloud service providers to allow an organization to assign quotas on how many resources can be consumed. For example, you may indicate a hard upper limit of instances that can ever be spawned, or a "not to exceed" charge for services consumed in an hour. Alerting mechanisms could be simple and standardized (SMTP, SNMP, etc.) so that you got alerts at 75%, 90% and a hard fail at 100% of quota.

At the risk of shamelessly pushing CSC's Cloud Orchestration vision, I think this level of vision and transparency are ultimately going to prove essential to widespread cloud computing adoption that preserves economic value for the client. Really, instance sprawl is just an much slower moving example of the kind of negative financial impact that can be incurred than the attack scenario you outlined. You need visibility to the costs in the cloud, but also the rate of change of the costs if you are to be able to properly manage the IT infrastructure of the future.

A great posting and a great summary of a problem that needs to be confronted and effectively addressed by consumers and service providers alike.

~Randy
Randy Arthur
10/16/2009 8:08 AM
Gravatar @Randy

Thanks and you bring up a good point - the service provider isn't going to be happy with the situation *either*.

Maybe this is a place where controls are needed to cap costs/instances as well as alerting based on trending/thresholds? Something that allows customers to say the typical use of this application is about X requests / sec (minute/hour) and if that doubles, do not launch more instances but instead alert (provider, me, twitter, the press, whatever)?

The more I think about it the more I see that this is another one of those "requires manual intervention" situations that, despite all our technological capabilities for automation and codification of intelligence, are still necessary in many different aspects of IT.

Lori
macvittie
10/16/2009 9:50 AM
Gravatar These concepts marry up real well with my concept for domains in an SOA, where each domain represents a set of attributes, such as a particular security level or quality-of-service, and services in that domain will are guaranteed to adhere to those attributes.
JP Morgenthal
10/16/2009 4:32 PM
Gravatar Randy--

I read your comment and the words "rollover minutes" sprang into my head. I imagine that providers with some flexibility in service provisioning (along the lines of Lori's thoughts) could include some upper-maximum in resources in a given month, with unused "service-minutes" rolling over to add to the ceiling in the next month, with some reasonable expiration policy to prevent people from building up impossible amounts of credit. Plus, there would be headroom in the system so the user could be warned when they started eating into their stash, so they could be prompted to provide more money, should they wish to do so.

~tom
Thomas Maufer
11/23/2009 3:02 AM
Gravatar Interesting,

Keep up the good work...

Anyway, thanks for the post
Software companies UK
1/11/2010 3:21 AM
Gravatar When Did Specialized Hardware Become a Dirty Word?
Lori MacVittie
6/9/2010 8:16 AM
Gravatar Love your article. Well written and nicely put
Windows tips
9/21/2010 8:19 AM
Gravatar When I'm trying to convince a client to use cloud services I would like to tell them the actual uptime in the absence of having a guarantee from Amazon ... or being able to say that S3 guarantees 99.9% but has actually had 99.99% over the past year.
muebles dormitorio
11/11/2010 5:39 AM
Gravatar Thanks. Good post. I definitely it is the provider who should bear the cost of an attack rather than the customer.
Online photo printing
11/30/2010 3:32 AM
Gravatar Great post, Lori. I was looking at moving some websites to a cloud service and although I found some that were profitable from a computer and storage point of view, was a persistent voice in my head "What is a DDoS attack? Autoscaling is certainly a big problem, but even without autoscale, paying the measured access to network traffic can be expensive depending on how long it's going to DDoS. Autoscaling only makes it worse. I was very studied. We wish to express our gratitude. Hereafter, a lot of information will be obtained.
New mosquito repellent plants
12/15/2010 12:25 PM
Gravatar @ RandyThanks and raise a good point - the service provider will not be happy with the situation * or *. Maybe this is a place where controls are needed to limit the cost / instances, and alert on the basis of trends or thresholds? Something that allows customers to say that the typical use of this application is on applications of X / sec (minutes / hour) and if that doubles, not throwing more cases, but instead of alert (provider, I, twitter, the press, whatever)? The more I think the more I realize that this is another 'requires manual intervention "situations which, despite all our technological capabilities for automation and consolidation of intelligence, are still needed in many different aspects of IT . Lori I admire the valuable information you offer in your articles. I will bookmark your blog and have my children check up here often. I am quite sure they will learn lots of new stuff here than anybody else!
New Ferrets for sale
1/2/2011 5:27 AM
Gravatar The perfect cloud in my opinion is one where you are not 100% dependent on it, where you have an offline backup system, which can be operational the minute the cloud solution gets into problems. And that is really the only solution for companies where an hour of downtime can mean a lot of lost business. But thanks for the nice article, especially the category of putting price on uptime. We use cloud solutions in our company and thought great of it until a downtime occured. It was not material to our business but enough to see that there is always a risk in the cloud computing. Best regards.
Technische, Medizinische
1/9/2011 3:08 AM
Gravatar Randy - I read your comment and the words "rollover minutes" sprang into my head. I Imagine That providers With Some Flexibility in service provisioning (Along the lines of Lori's Thoughts) Could include Some upper-maximum in resources in a Given month, with unused 'service-minutes' rolling over to add to the ceiling in the next month, With Some reasonable expiration policy to Prevent people from building up impossible Amounts of credit. Plus, There Would Be headroom in the system so the user When They Could Be Warned started eating Into Their stash, So They Could Be Prompt to Provide more money, Should They wish to do so. ~ Tom Great!
Camouflage wedding dresses
1/27/2011 5:24 AM
Gravatar DDOS is really bad. I suffered an attack, too. The infrastructure has to be prepared for this. Thanks for the article.
Pizza Woman
2/9/2011 2:22 AM
Gravatar "When I'm trying to convince a client to use cloud services I would like to tell them the actual uptime in the absence of having a guarantee from Amazon ... or being able to say that S3 guarantees 99.9% but has actually had 99.99% over the past year."
what does it mean????
IT consultant
3/9/2011 8:17 AM
Gravatar Great and invormativ post. Thanks for your article.
Kopiertechniker
8/15/2011 1:38 AM
Gravatar thanks very good work...

Anyway, thanks for the post
Thomas

Let Me Know What You Think


Please use the form below if you have any comments, questions, or suggestions.

Title:
 
Name:
 
Email: (so we can show your gravatar)
Website:
Comment: Allowed tags: blockquote, a, strong, em, p, u, strike, super, sub, code
 
Please add 3 and 8 and type the answer here:

Blog Stats

Posts:975
Comments:1681
Stories:0
Trackbacks:582
  

Image Galleries

  

Application Delivery

  

Cloud Computing

  

Random

  

Security

  

Chat Catcher

82,243 Members in 102 Countries and Growing!

Join DevCentral Today!

About DevCentral

DevCentral has been a successful, thriving community for many years. We have always strived to bring you the best technical documentation, discussion forums, blogs, media and much more that we can.

So dive in, get familiar with DevCentral. We hope you like it, we hope it makes your job easier, and lets you get that much more power out of the community. To learn more, make sure to check out the Getting Started section. And if you have any problems, or think something could be easier to use, drop us a line to let us know.

Got It !

We've received your comment and transmitted it directly to DevCentral HQ.

Thanks for taking time to let us know what's on your mind. At DevCentral | Community Matters!

Get In Touch With Us

Have questions, suggestions or just want to get something off your chest?

Use our handy form below to Direct Connect with DevCentral Mission Control.

Send Us Feedback       or