Secure, optimized tunnels to a remote site, e.g. the cloud. Haven’t we been here before?

In the continuing discussion around Business Intelligence in the cloud comes a more better (yes I did, in fact, say that) discussion of the reasons why you’d want to put BI in the cloud and, appropriately, some of the challenges. As previously mentioned, BI data sets are, as a rule, huge. Big. Bigger than big. Ginormous, even. One of the considerations, then, if you’re going to leverage a cloud-based business intelligence offering – or any offering in which very, very large data sets/files are required - would be how the heck are you going to transfer all that data to the cloud in a timely fashion?

The answer is, apparently, to ship a disk to the provider. Seriously. I couldn’t make this stuff up if I tried.

Ann All, in “Pros and Cons of Business Intelligence in the Cloud” paraphrases a post on the subject by “two guys from Persistent Software, Mukund Deshpande and Shreekanth Joshi.”

Moving large data sets to the cloud could get costly. They recommend shipping disks, an approach they say is often recommended by cloud providers like Amazon.

The original authors, cite network costs as a reason to choose that “never gets old sneaker network” option.

Moving data to the cloud – Large data sets in silos sitting on premises need to get to the cloud before they can be completely used. This is an expensive proposition due to the network costs. The cheapest option is to ship disks, and this is often recommended by cloud providers like Amazon. Although this introduces latencies, it is often acceptable for most BI options.

It’s not just latency that’s the problem – though latency that’s measured in days is certainly not a good thing - you’re talking about shipping off disks and taking the risk that it will get “lost” en route, damaged, or misplaced at some point in time.

Think I’m kidding about that risk?


WHY ARE WE REINVENTING THE WHEEL?

Actually, the basic problem is with our perception of “the cloud” as an external, completely separate entity. The problem is that cloud providers, too, have this basic perception. Basically the cloud is “over here” and your data is “over there” and ne’er the twain shall meet. That means that it is costly and time consuming to transfer extremely large files to the cloud. Hence the suggestion by cloud providers and bloggers, apparently, alike to “ship some disks” instead.

But as Christofer Hoff has pointed out in the past, why can’t there be “private, external clouds” or, at a minimum, “private, secure access” to external clouds that as a benefit of such a point-to-point connection, also employs WAN optimization and application acceleration techniques to improve the performance of large data transfers across the Internet?

Haven’t we been here before? Haven’t we already addressed this exact scenario when organizations realized that using the Internet to connect with partners and remote offices was less expensive than dedicated lines? Didn’t we go through the whole PKI / SSL VPN / WAN optimization controller thing and solve this particular problem already?

Seems we did, and there’s no reason that we can’t apply what we’ve already learned and figured out to this problem. After all, if you treat the cloud as the “headquarters” and its customers as “remote offices” you have a scenario that is not unlike those dealt with by organizations across the world every day.

In fact, Amazon just announced their version of a cloud VPN and Google’s SDC (Secure Data Connection) has been around for quite some time, providing essentially the same kind of secure tunnel/access to the cloud functionality.


BUT IT’S NOT THE SAME

The scenario is almost identical, but in many ways it isn’t. First, organizations have control – physical control – over remote offices. They can determine what products/solutions/services will be implemented and where, and they can deploy them as they see fit. This is not true with cloud, both from the cloud provider’s perspective and from the customer’s perspective.

isessiontothecloud After all, the solution to the pain point of hefty data transfer from organization to cloud is a combination of WAN optimization and secure remote access but what we don’t want is the traditional “always-on point-to-point secure tunnel.” A cloud provider has (or will have, hopefully) more customers than it can possibly support with such a model. And the use of such secure tunnels is certainly sporadic; there’s no need for an “always on” connection between organizations and their cloud provider.

What’s needed is a dynamic, secure, optimized tunnel that can be created and torn down on an on-demand basis. The cloud provider needs to ensure that only those organizations who are authorized are allowed to create such a tunnel, and it needs to be deployed on a platform that is able to be integrated into the provisioning process such that the management of such external connectivity and access doesn’t end up consuming human operational cycles.

But Lori, you say looking at the eye-candy diagram, it looks like such a solution requires hardware at both ends of the connection.

Yes, yes it does. Or at least it requires a solution at the cloud provider and a solution at each customer site. Maybe that’s hardware, maybe it’s not. But you aren’t going to get around the fact that a secure, encrypted, accelerated on-demand session that enables a more efficient and secure transfer of large data sets or virtual machine images across the Internet is going to require some network and application level optimization and acceleration. We’ve been down this road before, we know where it ends: point-to-point encrypted, optimized, and accelerated tunnels.

You’ll note that Amazon and Google already figured this one out, and yes, it’s going to be proprietary and it’s going to require software/hardware/something on both ends of the connection.

The difference with cloud – and it is a big difference, make no mistake about that – is that a cloud provider needs to support hundreds or perhaps thousands of periodic sessions with remote sites. That means it needs to be on-demand and not always-on as most site-to-site tunnels are today.

Your next observation will be to note that if this is going to require a solution on both sides of the relationship then who gets to decide what that solution is? Good question. Probably the cloud provider. Again, you’ll note that Amazon and Google have already decided. At least in the short term. There are no standards around point-to-point connectivity of this kind. There’s IPSEC VPNs and SSL VPNs, of course, but there’s no standards around WAN optimization and no way to connect product A to product B, so the solution in the short term is a single vendor solution. The long term solution would be to adopt some standards around WAN optimization but before that can happen WAN optimization controllers need to support an on-demand model. Most, unfortunately, do not.

You can, of course, continue to use a sneakernet. It’s an option. You can also continue to transfer over the Internet in the clear, which may or may not be an acceptable method of data transfer for your organization. But there is another solution out there and it’s not nearly as difficult to implement as you might think – as long as you have a solution capable of providing such a service in the first place and your cloud provider is willing to offer or, apparently, reinvent it.

 

Follow me on Twitter View Lori's profile on SlideShare friendfeedicon_facebook AddThis Feed Button Bookmark and Share