F5 Friday: Killing Two Birds with One (Solid State) Stone

Sometimes mitigating operational risk is all about the hardware.

MTBF. Mean Time Between Failure.

An important piece of this often-used but rarely examined acronym is the definition of “mean”:

The quotient of the sum of several quantities and their number; an average

An average. That means just as many folks experienced failure later than the value as did earlier. And it is the earlier that is particularly troublesome when it comes to the data center.

Customers replace disk drives at rates far higher than those suggested by the estimated mean time between failure (MTBF) supplied by drive vendors, according to a study of about 100,000 drives conducted by Carnegie Mellon University. (PCWorld, “Study: Hard Drive Failure Rates Much Higher Than Makers Estimate)

An eWeek commentary referencing the same study, which tracked drives heavily used for storage and web servers, found “annual disk replacements rates were more in the range of 2 to 4 percent and were as high as 13 percent for some sites.” It is likely the case that when evaluating MTBF rates for disk drives, the more intense, volatile access associated with storage and web servers increases the strain on components that lead to earlier and more frequent failure rates. Fast forward to today’s use of virtualization, particularly for shared services such as storage and compute, and volatile access certainly fits the bill.

So it makes sense that network components relying on disk drives that process highly volatile data – such as caches and WAN optimization controllers - would also be more likely to experience higher rates of failure than if the same drive were in, say, your mom’s PC. If you’re the geek of the family, then a failure in mom’s drive is not exactly a pleasant experience, but if you’re an admin in an IT organization and the drive in a network component fails, well, I think we can all agree that’s an even less pleasant situation.

But failure rates isn’t the only issue with network components and disk drives. Performance, too, is a factor.

The slowest functions today on any computing machine is I/O. Whether that I/O is for graphics or storage makes little difference (unless you happen to have a graphics accelerator card, then storage is almost certainly your worst performing component). The latency for reading and writing data is often dismissed by consumers as negligible, but that’s because latency doesn’t cost them anything. Performance for IT organizations and their businesses is critical, with mere seconds incurring losses for some industries at a rate of thousands or more per second. If a network component in the data flow is causing performance problems – especially those for which the value proposition is performance-enhancement – then it’s a net negative for IT and the business and the component may not survive a strategy audit.

So sometimes it really is about the hardware.

Now, F5 is almost always strategically in the data path. Whether it’s BIG-IP LTM (Local Traffic Manager) providing load balancing, BIG-IP WebAccelerator performing dynamic caching duties, or BIG-IP WOM (WAN Optimization Manager) optimizing and accelerating data transfers over WAN links, we are in the data path. Anything We can do to increase reliability and performance is a Very Good Thing™ both for us and for our customers. Because We own the hardware, We can make choices with respect to which components We want to leverage to ensure the fastest, most reliable platform We can.

So We’ve made some of those decisions, lately, and the result is a new hardware platform – the 11000. The primary benefit of the new platform? You got it, SSD (Solid State Disk) as an alternative to traditional hard-drives. The 11000 handles up to 20 Gbps of LAN-side throughput, 16 Gbps of hardware compression, and optional solid-state drives, which greatly reduces risk of failure (availability/reliability) while simultaneously improving performance. That’s two of the three components of operational risk. The third, security, is not directly addressed by SSDs, although the performance improvement when encrypting data at rest could be a definite plus.

I could go on, but my cohort Don MacVittie has already posted not one, but two excellent overviews of the new platform and, as a bonus just for you, he’s always penned a post on a related announcement around our new FIPS-compliant platforms.