Search
Lori MacVittie - Two Different Socks
You are here: DevCentral > Weblogs

posted on Wednesday, June 16, 2010 4:23 AM

Like most architectural decisions the two goals do not require mutually exclusive decisions. 

The difference between fault isolation and fault tolerance is not necessarily intuitive. The differences, though subtle, are profound and have a substantial image_thumb[4][6]impact on data center architecture.

Fault tolerance is an attribute of systems and architecture that allow it to continue performing its tasks in the

event of a component failure. Fault tolerance of servers, for example, is achieved through the use of redundancy in power-supplies, in hard-drives, and in network cards. In an architecture, fault tolerance is also achieved through redundancy by deploying two of everything: two servers, two load balancers, two switches, two firewalls, two Internet connections. The fault tolerant architecture includes no single point of failure; no component that can fail and cause a disruption in service. load balancing, for example, is a fault tolerant-based strategy that leverages multiple application instances to ensure that failure of one instance does not impact the availability of the application.

Fault isolation on the other hand is an attribute of systems and architectures that isolates the impact of a failure such that only a single system, application, or component is impacted. Fault isolation allows that a component may fail as long as it does not impact the overall system. That sounds like a paradox, but it’s not. Many intermediary devices employ a “fail open” strategy as a method of fault isolation. When a network device is required to intercept data in order to perform its task – a common web application firewall configuration – it becomes a single point of failure in the data path. To mitigate the potential failure of the device, if something should fail and cause the system to crash it “fails open” and acts like a simple network bridge by simply forwarding packets on to the next device in the chain without performing any processing. If the same component were deployed in a fault-tolerant architecture, there would be deployed two devices and hopefully leveraging non-network based failover mechanisms.

Similarly, application infrastructure components are often isolated through a contained deployment model (like sandboxes) that prevent a failure – whether an outright crash or sudden massive consumption of resources – from impacting other applications. Fault isolation is of increasing interest as it relates to cloud computing environments as part of a strategy to minimize the perceived negative impact of shared network, application delivery network, and server infrastructure.


SIMILARITIES and DIFFERENCES

It may sound at first as though designing for fault tolerance is not very much different than designing for fault isolation. On the surface this is true. But the importance assigned to fault tolerance is generally higher, and it is often the case that in fault tolerant architectures the “secondary” or “fallback” component always remains in “standby” in the event it is needed. Sometimes IT management decides that since there hasn’t been a need for the image secondary components as long as they can remember that the secondary components should be engaged and leveraged as additional resources. After all, idle resources are the devil’s playground and a source of inefficiency that cannot be tolerated in today’s increasingly Maxwell House “to the last drop” paradigm. At issue with this approach is that the MTBF (Mean Time Between Failure) for a component is based on its use, and the more it is used the closer it comes to experiencing a failure. Thus leveraging what appear to be “idle” resources actually increases the possibility that in the event of a primary component failure the secondary, too, will fail. In a truly fault tolerant architecture or system this is unacceptable. A truly fault tolerant architecture will not allow for secondary components to be utilized on a day-to-day basis.

A fault isolation strategy is about designing an architecture in which a failure on the part of a component does not impact other applications. For example, an architecture that employs fault isolation will ensure that a rogue or run-away process in an application does not negatively impact other applications also deployed on that same server. This is one of the biggest benefits of virtualization and one that is rarely discussed. Virtualization, like sandboxes in a browser, can isolate individual applications and ensure rogue or runaway processes/applications cannot impact the overall system or other applications. Virtualization, however, is better at fault isolation than sandboxes because it can constrain the compute resources that can be consumed by a given application or process while browsers more than often do not and cannot impose this restriction.


HERE COMES the FENG SHUI

Data center Feng Shui is about the right solution in the right place in the right form factor. So when we look at application delivery controllers (a.k.a. load balancers) we need to look at both the physical (pADC) and the virtual (vADC) and how each one might – or might not – meet the needs for each of these fault-based architectures.

In general, when designing an architecture for fault tolerance there needs to be provisions made to address any single component level failure. Hence the architecture is redundant, comprising two of everything. The mechanisms through which fault tolerance is achieved is failover and finely grained monitoring capabilities from the application layer through the networking stack down to the hardware components that make up the physical servers. pADC hardware designs are carrier-hardened for rapid failover and reliability. Redundant components (power, fans, RAID, and hardware watchdogs) and serial-based failover make for extremely high up-times and MBTF numbers.

vADC are generally deployed on commodity hardware and will lack the redundancy, serial-based failover, and finely grained hardware watchdogs as theseimage types of components are costly and would negate much of the savings achieved through standardization on commodity hardware for virtualization- based architectures. Thus if you are designing specifically for fault tolerance, a physical (hardware) ADC should be employed.

Conversely, vADC more naturally allows for isolation of application-specific configurations a la architectural multi-tenancy. This means fault isolation can be readily achieved by deploying a virtualized application delivery controller on a per-application or per-customer basis. This level of fault isolation cannot be achieved on hardware-based application delivery controllers (nor on most hardware network infrastructure today) because the internal architecture of these systems is not designed to completely isolate configuration in a multi-tenant fashion. Thus if fault isolation is your primary concern, a vADC will be the logical choice.

It follows, then, if you are designing for both fault-tolerance and fault-isolation that a hybrid virtualized infrastructure architecture Links directly to a PDF white paper will be best suited to implementing such a strategy. An architectural multi-tenant approach in which the pADC is used to aggregate and distribute requests to individual vADC instances serving specific applications or customers will allow for fault tolerance at the aggregation layer while ensuring fault isolation by segregating application or customer-specific ADC functions and configuration.


Related blogs & articles:

Follow me on Twitter    View Lori's profile on SlideShare  friendfeed icon_facebook

AddThis Feed Button Bookmark and Share

 



Feedback

7/9/2010 3:34 AM
Gravatar F5 Friday: Would You Like Some Transaction Integrity with Your Automation?
Lori MacVittie

Let Me Know What You Think


Please use the form below if you have any comments, questions, or suggestions.

Title:
 
Name:
 
Email: (so we can show your gravatar)
Website:
Comment: Allowed tags: blockquote, a, strong, em, p, u, strike, super, sub, code
 
Please add 5 and 8 and type the answer here:

Blog Stats

Posts:979
Comments:1685
Stories:0
Trackbacks:583
  

Image Galleries

  

Application Delivery

  

Cloud Computing

  

Random

  

Security

  

Chat Catcher

82,243 Members in 102 Countries and Growing!

Join DevCentral Today!

About DevCentral

DevCentral has been a successful, thriving community for many years. We have always strived to bring you the best technical documentation, discussion forums, blogs, media and much more that we can.

So dive in, get familiar with DevCentral. We hope you like it, we hope it makes your job easier, and lets you get that much more power out of the community. To learn more, make sure to check out the Getting Started section. And if you have any problems, or think something could be easier to use, drop us a line to let us know.

Got It !

We've received your comment and transmitted it directly to DevCentral HQ.

Thanks for taking time to let us know what's on your mind. At DevCentral | Community Matters!

Get In Touch With Us

Have questions, suggestions or just want to get something off your chest?

Use our handy form below to Direct Connect with DevCentral Mission Control.

Send Us Feedback       or