Many networking and application development folks just don’t get it – why you would want or need virtualized storage, what benefits it offers, and whether they’re really worth the cost.

That’s what this document is for. While you likely know some of what I’m covering here, unless you’re a storage professional you will also likely learn something from this article. Hopefully you get more learning and less repetition, I’ll certainly do my best.

Remote storage has been around forever – arrays of disks were the standard on mainframes, and are common enough in networked environments. Be it called a File Server, a Disk Array, or a NAS, in the end it is all storage for your server to access that is not stored in the server itself.

The differences in the storage options available to you are huge, so we’ll take a walk through the available options and give you an overview of the types of storage virtualization out there.

 
SAN Virtualization

Oh what a nasty phrase SAN Virtualization has become! On the SAN, blocks of data are managed by your PC just as if they were being stored on your local hard disk. Except they’re not. Those blocks are going through a SAN switch to be stashed on one of many spindles of the mighty SAN. You have a Logical Unit Number (LUN) and a sector, and the switch translates that LUN into physical disk and sector. The thing is that most modern SANs all use the same system to do the translation, so if your SAN switch dies or you change vendors, in theory (though not generally in practice) you can get to your data without copying it all off the SAN.

Enter SAN Virtualization, which says “Why should you care how big LUN seven is? You should save your data and let us worry about details like that!” And it does, amazingly well. Problem is that there is no accepted standard (though undoubtedly some vendor has submitted their solution as the standard and muscled it through) and no way to find your data if the Virtualization appliance dies. The map of what chunks of file got stored where are gone if the appliance goes.

And that is exactly why SAN Virtualization is a dirty word in most IT shops. It’s bad enough that you have to go without most (or all) of your data while you wait for service to come out and replace a faulty SAN switch, but to have to coordinate and get the Virtualization appliance back in there, and then find the newest backups of the Virtualization config and cross your fingers that it wasn’t something important that was saved since then… Ugly stuff.

Where SAN Virtualization has gotten some traction is on-switch solutions… Those that are built into the switch, so when you have to replace the switch you are automatically replacing the Virtualization tables. In an environment that runs redundant switches, this is a perfectly acceptable solution that some have taken advantage of. The bad press about SAN Virtualization has limited even this legitimate usage though.

 

NAS Virtualization (aka File Virtualization)

In the case of a NAS, blocks aren’t even part of the equation. The NAS OS (be it Windows, Linux, a proprietary embedded OS, or whatever) handles block allocation and exposes files and shares to the world. You can’t say “disk seven, offset 10000” to a NAS, it expects a mount point and filename instead, and it worries about where and how that file is actually stored, returning it to you in a format you expect either via CIFS (Common Internet File Standard) or NFS (Networked File System) interfaces.

NAS is much more common than SAN, if only because all servers support either CIFS or NFS (and many support both), so it is far easier to make use of in day-to-day life than a SAN that requires interface cards for the network and configuration of disk and how your PC accesses that disk.

NAS Virtualization has suffered somewhat from the negative world-view of SAN, but slowly IT staffers are figuring out that there is a distinct difference. In NAS Virtualization, a unified directory is presented to the network, so that instead of having Machine:MountPoint combinations that are all over the place, you have a single name associated with the Machine portion and a variety of mount points. Inside the NAS Virtualization appliance there is a mapping from this “Global Directory” to the actual Machine:Mountpoint combination.

The key reason why NAS Virtualization is different from SAN Virtualization? It leaves your data in a usable state. You may not know where every file is, but you’ll have a good idea, and since most data is stored by directory, you’ll know exactly where most data is.

The problem that SAN Virtualization faces of putting your data where you can’t find it just isn’t an issue, because NAS Virtualization maps mount points, not blocks of disk. Even if the Virtualization appliance moves your file, it moves it with the name intact and the entire file, so you can still find it.

Some early NAS Virtualization products weren’t designed so intelligently, and they would change the name, move the file, and keep a map of what the actual filename was. When the appliance went down, just like SAN Virtualization, the data was lost. Needless to say, those products have been redesigned or fallen by the wayside.

NAS Virtualization products can and do move data – our ARX product line will let you set policies on where to store files and will move them if a file meets the criteria you’ve set – but they keep the file name, and you have to enable the policies, so you’ll know vaguely where to look if a file is not where you expected it.

In point of fact, the ability to move data around based upon rules is the largest single strength of NAS Virtualization. You can make rules that say “move infrequently updated or accessed files to slower, cheaper storage”, then only back up the stuff that is frequently updated. This shortens your backup window and saves on tape costs.

A unified directory structure is nice also – with a single mount-point and many subdirectories – but in some older versions of Windows this kind of functionality can backfire – the mount point is the default entry point, so you end up with a bunch of shares that are all the same. Since this is only a problem with really old versions of Windows, it’s not worth wasting brainshare about – if you’re running XP or newer, or Win2K or newer on your severs, it’s just not an issue. And if you’re running a version of Windows older than that on more than one or two machines in your building, you probably need to consider your IT priorities.

 

Virtual Tape Libraries (VTL)

Virtual Tape Libraries are, in this writers’ humble opinion, one of the cooler underutilized solutions that has come out of storage in forever. While everyone was chasing after Information Lifecycle Management, they should have been throwing in a VTL instead. At least VTL is a viable solution with a viable market.

The concept is simple… All of the backup software out there is designed to work with tape libraries, but disk has gotten cheap, so how do we take advantage of all that really cheap disk out there to handle backups while keeping our favorite tape software without major changes?

Enter the VTL – you can make that disk appear to be tape, many many tapes. With the access (both read and write) speed of disks, the seemingly continuous growth of disk size, and the ability to be accessed by backup software like tape, a VTL allows for fast backups that can later be streamed from the VTL to actual tape, or just left on the VTL until they expire. VTLs do all of the things that a high-end tape jukebox will do, but they take up a whole lot less space. Backups are massively speeded up, but that’s not the primary reason to use a VTL, the primary reason is speed of restore, which is by an inestimably huge factor – simply because you don’t have to search for the right tape and then seek to the right location… You merely tell it what to restore and from which date, and it does the restore at disk speed.

Is there risk to a VTL? Some, but not significantly more than tape, particularly if the disk the VTL is using is RAIDed. Tape backups go bad, we all know that, but keeping track of the numbers for failed tapes has never been a strong point of the industry if only because needing a restore isn’t something that’s reported anywhere unless the restore is a huge deal such as those caused by hurricanes. But disk at least can be RAIDed to reduce the number of actual failures. Multiple disk failures or un-intercepted power surges can still cause all of your backups to disappear, but if you run even one tape a week out to physical tape, you’ll have something to fall back on even in the unlikely event that you do have a huge emergency.

One caveat that you have to consider with VTL is that if you are moving from local backups to VTL backups, you’ll be sending a lot of data over the network, and you should be aware of the impacts it will have. Of course if you’re putting a VTL into an already centralized backup, this will change nothing transport-wise.

 

 
iSCSI Virtualization

The concept of iSCSI is another under-utilized phenomenon in IT infrastructure, in this author’s opinion. The idea is another simple one, perhaps that’s why it is resisted so fiercely. Simplified at a level that will offend iSCSI vendors, the concept is that you don’t need a specialized network to handle reading and writing blocks of data in the manner that a SAN uses, you have Ethernet. So iSCSI uses Ethernet to communicate read and write commands to storage.

One of the downsides of this is that you either have to install a separate network to handle your storage needs or you need to make certain that your network has enough spare bandwidth to handle the additional storage traffic.

One of the major upsides of this technology is that it overcomes a problem with NAS, the other Ethernet storage technology – the ability to read and write blocks. Until very recently not a single major database or email system worked with NAS because they require read and write access at a lower level than NAS can offer. That is changing, but meanwhile, iSCSI gives you the network-free elements of NAS with the block capabilities of SAN.

I’ve never seen the phrase “iSCSI Virtualization” used before, I admit to making it up. The concept is simple though… You have an iSCSI array on your network and it is running out of space. Drop another iSCSI array anywhere on your network and tell it that it belongs to the first one and it automatically adds its storage to the storage on the “primary” array. Dell EqaulLogic was the first vendor I ever touched that implemented this technology, but others have hopped on the bandwagon since then.

The point is that you don’t have to move a bunch of people’s disks to the new array, or repartition the old one to make it more useful… Your pool of available disk space just grows, and the disks you have configured will just grow too if you have the arrays set up correctly, so that as a disk becomes low, it automatically gets more available space.

As with most cool new technologies, you can’t yet just drop in any old iSCSI compliant device and have it pool up with the others on your network – your vendor has to support it and you have to buy all from that vendor – but it’s still a cool concept that has a bright future.

 

 

Conclusions

Which of these are for you? That all depends on your environment, but there are some guidelines to help you decide.

If you’re doing backups, seriously look into the concept of VTLs. If you have and rejected the idea, look again. I’ve used a few, but since it’s been a while and F5 isn’t in the space, I won’t make any vendor recommendations, but they’re easy enough to find. I know that Data Domain and FalconStor both make VTLs, but I’ve never had the pleasure of using either vendors’ solution.

If you have a SAN, you can look into SAN Virtualization, but honestly most companies come to the conclusion that it is too risky. Vendors – big name vendors like EMC and IBM – have been working to resolve these issues, so it’s possible there are better solutions and I’m just not aware of them.

If you have a NAS, you can save time and money by implementing Directory or File Virtualization, most implementations can save money without breaking a sweat (though of course that’s dependent upon your current architecture and how much of that you’re moving into Virtualization). It is definitely worth a second look. Or even a third. If you already have tiered storage, there aren’t very many better ways to manage it than a Virtualizated directory with rules to move files like you get from our ARX product line.

If you’re not using iSCSI for anything, it’s worth a look too. Some higher end appliances have performance (and prices) that rival traditional SAN implementations – except you don’t have to buy an outrageously expensive SAN switch to use them, just a plain old Ethernet switch. As long as you’re out looking, check into their virtualization capabilities. You won’t regret it, particularly if your rate of data growth is more than a couple of percent – basically enough that you’ll run out of space in five or less years.

All in all, storage doesn’t turn over at the rate that other IT/High Tech products do, mostly because of inertia. It is no small project to migrate from one storage architecture to another, so IT doesn’t put itself in that boat unless it must. These solutions are largely additive, not requiring you to replace your arrays, filers, file servers, whatever you have at your organization… They merely help you make better use of the architecture you have. Depending on your storage vendor, they may not even require new hardware, though NAS virtualization generally does.