image I was working for a mid-sized enterprise as an IT manager, a project that was on the cutting edge of technology at the time, and because it was on the cutting edge, we were using a whole slew of different embedded applications and their masters to collect data. Those masters were written on every platform imaginable – from Novell Netware to Windows to Linux to Solaris – and in every language that was common on each of the platforms. Our job was to make sense of it all. The information these systems collected was billing data, they all collected similar datasets, but all in different manners and all used different ways to store the data in databases. And they all used different RDBMS’s. We had Oracle, MS SQL Server, Sybase, MySQL, IBM UDB, and a few you’ve not likely heard of. We had our own datacenter, and it was a non-stop flurry of activity just trying to consolidate the data and get it into a consistent format and centralized for billing on a single DBMS. We had custom code, Extract Transform and Load (ETL) systems, extraction systems that we then loaded the resulting data from into our central database, all just to get the data in one place.

That’s the worst case I’ve ever been involved in, but seriously every place I’ve worked has had multiple database vendors because we live in the age of purchased applications, and even when a vendor says “oh yeah, we support X, Y, and Z”, smart IT folks immediately ask which one they develop for primarily, because that’s the one that will get the first attention when updates occur, and it is the one most likely to be stable. So while you theoretically could standardize on a single database, and every enterprise I’ve ever worked at has either wanted to or said they did… But purchased applications make it highly unlikely that they ever will.

 

Image Courtesy of www.servermachine.net

Still, you need a way to communicate that data back and forth, and when the enterprise shifted to “buy before build”, that’s where the programmers went – to integration duties to try and straighten out communications. Your purchased (or service) shipping system needs to update inventory, which is a different system on a different database, etc. We’ve got about a decade of this, and most IT shops have a relatively stable environment that transfers data back and forth as needed, but is  high maintenance, since every release that changes tables or columns evokes a new round of integration work. And unless you’re terribly lucky, no two purchased packages are on the same update cycle.

It is not my habit to plug specific products in this blog, even F5 products. I like to keep it useful to you and figure that if you find it useful, F5 indirectly gets the name recognition. F5 has thus far allowed me the freedom to do just that, and this blog is not a sign of some major shift. While I am going to  plug a specific product, it is not an F5 product. I’m going to tell you how all of the pain caused by the above issues can be alleviated, using Oracle Goldengate. Oracle is a partner of F5, and our uber-smart Business Development and Product Management Engineering teams have been working with Oracle on the Goldengate product and how it fits into our partnership. I was brought in to produce some collateral, and after reading up on Goldengate, fell in love.

It is not often that I, after more than a decade working in IT and several years as a Technology Editor, get excited about a product, but Goldengate fits the bill. It solves a problem that other solutions (like ETL engines) could be hacked to solve, but it does it directly and simply.

Oracle acquired Goldengate in mid-2009, and because it is not my job to pay attention to this stuff, the importance of the announcement flew under my radar. That being the case, I figure it might well have flown under your radar also. The architecture of Goldengate is, like most technology, simple to understand at the 50,000 foot level, and I’ll direct you to Oracle’s Goldengate website if you need more info. You purchase two copies of Goldengate, one to be the source and one to be the destination. The source reads log files and generates a binary representation called a trail file. There is another process on the source called the data pump that then sends this data out across the network to the destination. A piece of software called the Collector picks up the incoming stream and writes it out to a new trail file, then a final process called Replicat reads this binary trail file and creates transactions from it to submit to the database.

This sounds like an optimized database replication tool, which in itself would be kind of cool but not real earth-shattering. The reason this tool caught my attention (and garnered enough excitement to warrant a blog) is that the source RDBMS and the target RDBMS do not have to be the same vendor. Yes indeed, you read that right. Think of it as heterogeneous near-real-time replication. Have a purchased application that runs on SQL Server but your core datacenter RDBMS is UDB? No problem, purchase SQL Server for the source and UDB for the target, configure and tune, and then tell the DBAs where to find the replica of the data. So you create a separate tablespace and just dump into it. If nothing else, you only have to back up the big master database.

In the case of serious integration issues with many systems on many RDBMS’s needing to talk, this is a lot cleaner than what most of us are doing. And a lot faster to adapt to changing table/column configurations. If this had been available on that first project I reference above, perhaps my team wouldn’t have grown so quickly from tiny to huge. We’d have still needed DBAs and Systems Admins and Engineers, but developer count might have been smaller since almost all of our developer hours were database integration time. We only developed a few applications, our policy was definitely “purchase if possible”. I know in mergers and acquisitions space this tool would also be a huge boon. “We need to move data from our new subsidiary into our systems” is perhaps the most dreaded M&A phrase an IT person can hear. Or second most if “and you’re in charge of the integration, be done by Monday?” is first most dreaded.

I haven’t used Goldengate, and I know there are a host of ETL solutions that could be hacked to perform this job, but they list all of the major database vendors on their supported RDMBS list, and Oracle is pretty good about providing solid support before issuing such a statement. And the relative simplicity is striking. Sure it will take installation on two (or more) systems, and configuration of both the networking component and the trail file component – it has to know what data you want replicated, and where to send that data – but that’s much less work than writing or hacking tools to do the same job.

So it is worth checking out. I know I would if I was still in IT management. Life is complex enough, let me move all of my data to one DBMS and do all of my calculations, reporting, tabulation, etc. there. And since it is essentially a replication tool, I’d also replicate it off so things like reporting weren’t bogging down the primary database.

And yeah, we have tools to make it even better. If you’re thinking of running Goldengate over the WAN, watch for updates from our BIG-IP WOM team, but I’m sticking with my general rule not to plug products.

It certainly does appear that Goldengate is going to usher in the golden age of data mobility, which would be good, data integration is one of the sticking points in highly adaptable IT.