posted on Monday, January 25, 2010 3:55 AM
Cloud computing and content delivery networks (CDN) are both good ways to assist in improving capacity in the face of sudden, high demand for specific content but require preparation and incur operational and often capital expenditures. How about an option that’s free, instead?
While it’s certainly in the best interests of every organization to have a well-thought out application delivery strategy for addressing the various events that can result in downtime for web applications it may be that once in a while a simple, tactical solution will suffice. Even if you’re load balancing already (and you are, of course, aren’t you?) and employing optimization techniques like TCP multiplexing you may find that there are sudden spikes in traffic or maintenance windows during which you simply can’t keep your site available without making a capital investment in more hardware.
Yes, you could certainly use cloud computing to solve the problem, but though it may not be a capital investment it’s still an operational expenditure and thus it incurs costs. Those costs are not only incurred in the event that you need it, but in the time and effort required to prepare and deploy the application(s) in question for that environment.
Consider that you generally serve a fairly consistent patronage, such as would be the case for a local media outlet. No doubt you’ve got the infrastructure in place to handle the thousands of local visitors you receive on a daily basis, but what happens if a blog or editorial or news story is posted that catches someone’s eye? Often it’s relayed to Slashdot, or Digg, or Fark. And if it garners interest there, well, you may in real trouble and have a difficult time maintaining availability. You need a solution that can reliably handle just such a situation, but you can’t predict when that situation may arise. After all, “odd” or breaking news doesn’t often happen with any amount of notice. The budget to build out a larger infrastructure to handle a “could happen, might happen, can’t guarantee will happen” scenario is impossible to justify.
What you need is a down and dirty, inexpensive (as in free) solution as an “insurance” plan against losing availability of your site. If that’s the case, perhaps what you need is to leverage the Coral Content Distribution Network.
WHAT is this CORAL thing?
I could describe it myself, but really the description offered up by the best source (the creators) says it far better than I could:
CoralCDN is a decentralized, self-organizing, peer-to-peer web-content distribution network. CoralCDN leverages the aggregate bandwidth of volunteers running the software to absorb and dissipate most of the traffic for web sites using the system. In so doing, CoralCDN replicates content in proportion to the content's popularity, regardless of the publisher's resources---in effect democratizing content publication.
-- Coral Content Distribution Network | Overview
According to its Wikipedia entry, it is simplicity itself to take advantage of Coral Cache:
A website can be accessed through the Coral Cache by adding .nyud.net to the hostname in the site's URL, resulting in what is known as a 'coralized link'. So, for example, http://example.com becomes http://example.com.nyud.net. For websites that use a non-standard port for example, http://example.com:8080 becomes http://example.com.8080.nyud.net.
Basically you can leverage Coral to mirror a given host such that your site remains available in the face of an onslaught of traffic, and it’s free. What is not explained is how to get users to access your site via Coral Cache in an on-demand way, such as when a sudden spike in traffic would otherwise make your site inaccessible. Think of Coral as an on-demand, instantly provisioned content distribution network that will mirror your site and keep it available. All you need to do is take advantage of it.
Certainly if you know ahead of time you can create a link as described above and use it instead of your normal link, but it’s not always evident ahead of time that you’ll need the extra bandwidth/capacity and it would be nice if you could leverage such a solution on-demand. So what would be nice is a way to invoke these external services on-demand, in a way that’s not unlike the way in which caching solutions alter URLs, i.e. rewrite them, to take advantage of commercial content delivery networks (CDN).
HOW DO I DO THAT?
There are quite a few ways to leverage such a service on-demand, but all require that you have some amount of visibility into the current operational state of your site and infrastructure. You can’t execute the logic necessary to take advantage of Coral if you don’t know you need it, after all. I’ll offer up three different ways in which you could integrate Coral into your availability strategy; there are many more, I’m sure. The methods included here require that you have a network-side scripting enabled solution at your disposal. If you’ve already got a load balancing solution, check with the vendor; it’s possible that you have the capability. If you don’t, you may want to consider using something like mod_rewrite that gives you similar capabilities, though you’d need to deploy the rules created on every server if you do that unless you create a proxy for your web servers and implement the rules there. That’s one of the advantages of a Load balancer/application delivery controller: it by nature virtualizes multiple servers and acts as a proxy for them, providing a single, centralized location in which to implement these kinds of solutions.
- Maintenance Window Redirect
Use case: During specific times of the week/day you’d like the ability to “take down” your servers for maintenance and you’d like to take them all down at the same time to reduce the time required to update/patch them all. In this case you’ll want to codify the times during which your servers will be unavailable and create a redirect (HTTP 302) to the Coral Distribution network as specified above, e.g. www.example.com.nyud.net
- Referrer Based Redirect
Use case: Generally speaking the chances of quickly being overwhelmed by traffic are directly related to where the requests are coming from, i.e. Slashdot, Fark, Digg. Thus to handle this scenario you’ll want to create a network-side scripting rule that examines the HTTP_REFERRER header and, if it matches one of the “oh-lord-we’re-about-to-get-hammered” sites, redirect to the Coral Distribution network.
- Connection/Request Limit Redirect
Use case: If you have a good idea what the total capacity of your servers is (and you do, because you’ve tested it under load, right?) then you can monitor current load on the load balancer/application delivery controller and upon nearing* those limits begin to redirect subsequent users to Coral. This solution requires a bit more intelligence and flexibility in the network-side scripting capabilities as you’ll need to track statistics, execute redirects based on variables, and end the redirection as requests slow down/decrease.
*The way that Coral works requires that it be able to access your site at least once to mirror it. Thus you cannot simply begin redirecting all requests to Coral without first allowing it to mirror the site by processing a request. This limitation necessarily requires that the network-side scripting solution you employ to implement such a solution be capable of allowing you to codify some amount of logic to allow this process to happen.
Okay, I lied – I’ll offer up a fourth option that requires no scripting and can be utilized without a load balancer:
4. Publish Coralized URI
Use case: If you’re publishing social media quick links on a story/blog/site, use the Coral-enabled URL instead of the origin content as the “link” to share. This won’t stop people from cutting and pasting from the address bar in their browser, but it will make sure that any “sharing” of the content immediately leverages the CoralCDN to distribute.
The reason I was leery of offering up the fourth option is because you lose visibility into statistics when users are directly sent to the CoralCDN. The other three options will be “counted” in logs and in statistics because they first connect to your site (the load balancer/application delivery controller) and then connect to the CoralCDN. Because the load balancer/application delivery controller is almost guaranteed to be able to handle more traffic than your servers, it can easily respond to requests with a redirect. But because it is responding it is counting the connections – and has all the relevant information about the client you might be aggregating - and therefore you don’t lose visibility.
If visibility isn’t an issue, then encouraging users to access the content directly via CoralCDN will certainly be one way to achieve the goal of keeping your content available.
There it is then; a free content distribution network that can be leveraged on-demand. Using CoralCDN is not a panacea and has limitations, of course, in that it’s not as flexible as cloud computing; it essentially mirrors your site, it doesn’t distribute it. But if it’s specific content that’s experiencing high demand and it’s not a normal occurrence, then a limited, tactical solution like CoralCDN may be just what you need to keep your site available and enjoy your 15 megabytes of fame.
Related blogs & articles:
Technorati Tags:
MacVittie,
F5,
coralCDN,
content delivery network,
application delivery network,
CDN,
mod_rewrite,
network-side scripting,
load balancer,
slashdot,
digg,
fark,
availability