DevCentral Groups
   
You are here: Community > Group Details > Oracle / F5 Solutions

Get Connected with DevCentral

Sign up and Join this Group today!

Connect with your peers with the click of a button. Become a member of this group to post questions, sign up for notifications, provide comments, answer questions, access downloads and receive lots of other great documentation relevant to your interests. Connect with your community today!

  

Group Details

Oracle / F5 Solutions

F5 DevCentral Topic Group dedicated to open discussion and collaboration related to the integration between and deployment of F5 and Oracle software solutions.
Oracle / F5 Solutions
Upcoming Events
There are no events currently connected to this group. Click here to search all F5 Events.

Having trouble posting to this forum? Click the "Join Group" button above to get access!

F5 Big IP for App Failover between Oracle Data Guarded Clusters
Last Post 05/11/2010 11:33 AM by Chris Akker. 12 Replies.
Printer Friendly
  •  
  •  
  •  
  •  
  •  
Sort:
Please login or join DevCentral to post a reply.
 
PrevPrev NextNext
Author Messages
OracleGuru
OracleGuru
Post Count: 5
New Member


--
02/23/2009 01:36 PM  
I've got a situation where the loss of a RAC Cluster requires the restart of all application servers to repoint them to the Oracle Data Guard Failover Cluster. In this case there are hundreds of application servers, all with connection pools. The time to failover the Oracle Database 10gR2 RAC Cluster to the standby is in minutes before it is fully available, and could be quicker. The application servers don't start returning to online status for at least 1 1/2 hours to as long as three hours as they must be completely shutdown and manually restarted (I know). My objective is to have an architecture where the connection pools are simply rerouted to the Data Guard Failover Cluster which is available within minutes, and if fast start failover is used, virtually immediately. F5 Big IP equipment is used in the DC and I'm looking for architeture/configuration suggestions on how best to approach this to make the application failover seamless between a failed primary cluster, and a Data Guard standby cluster which assumes the role of the primary cluster on failover. This is a new architecture we're moving towards, so anything is basically on the table.

TIA

Bill
hoolio
hoolio
Post Count: 11053
MVP - 9


--
02/25/2009 02:22 AM  
Hi Bill,

I thought RAC should provide seemless failover between cluster nodes. Are you trying to handle the scenario where the complete cluster goes down?

It would be good if someone here with more experience with Oracle commented as well, but in concept, I think you could configure a single pool with two groups of members. You could use priority group activation to ensure application requests go to the higher priority group first. If that group isn't available the lower priority members would be used. You could configure an Oracle DB monitor to check if the pool members are up.

If you want LTM to send a reset to the client if the selected pool member goes down, you can configure this using the 'Action On Service Down' down option in the pool properties.

Aaron
OracleGuru
OracleGuru
Post Count: 5
New Member


--
02/25/2009 07:44 AM  
Hi Aaron,

You're right the failover of nodes in a RAC 10g and later cluster is seamless. The ability of the app to seamlessly failover from node to node depending on how it's connectivity is configured may be another matter, but my question is purely focused on cluster to cluster failover. Consider two RAC clusters local to each other, one the primary, the other a local standby, i.e. same data center, and a 3rd cluster in a remote data center. Oracle's Data Guard will handle the failover of the Oracle RAC Cluster to either the local standby or the remote standby. The preferred failover target is the local standby, backed up by the remote standby. My challenge here is how to seamlessly repoint the 100+ application servers from the primary cluster to the local standby and back again without having to do any type of a restart of the app servers, or bounce of the connection pools from the app servers. I've used the F5 BIG-IP and Cisco Catalyst for load balancing of app server connections to my cluster nodes when a thin java client was in place, but this is different in that their current application as it exists today, can't manage the persistence or caching of transactional data. As a result, in the event of any complete cluster failure, they are shutting down every app server and manually repointing them after the standby database cluster is up and running. We could use FAN and FCF and an application API to dynamically manage the failover, but their app won't support it. Hence, I'm looking outside the box at alternatives to enable a "rapid application failover" that can execute in minutes without dropping the connection pools allowing the app servers to "simply" failover to the standby database cluster as the primary database cluster fails over the same standby database cluster.

It's not relevant in a remote DR scenario as a complete duplicate set of app servers exist which will have to be brought up anyway in the event of a primary "site" failure.

HTH clarify what I'm trying to architect.

Again, thanks to anyone who can shed some insight on how to configure the F5's to support this.

Bill
hoolio
hoolio
Post Count: 11053
MVP - 9


--
02/26/2009 08:42 AM  
Hi Bill,

Thanks for the explanation. I'm not an Oracle expert by any means, so that was useful.

If the app client was configured only to open connections to the LTM VIP, LTM could select a new cluster if the primary first cluster went down. This would provide near-seemless resilience at the TCP layer. But do the other clusters have real time mirroring of the first cluster's data? If so, it seems like the failover between clusters could be seemless at the TCP and app layer.

If the data isn't mirrored between clusters, what would the app client need to get in response from the server end (either LTM or the database) to tell it to restart its session because the existing cluster died? Would a TCP reset suffice? Or a SQL level message? Or something entirely different?

Aaron
OracleGuru
OracleGuru
Post Count: 5
New Member


--
02/26/2009 09:29 AM  
Aaron,

Most of these apps (if not all) will use an Oracle Service Client (SQL*Net, jdbc thin or jdbc fat) for connectivity to the Oracle Database Cluster whether primary or standby in the event of a failover. When a cluster fails, or cannot be reached via SQL*Net/Net Services a conventional tcpip timeout is returned based on the tcpip settings. When this occurs, an error is returned to the http web client (end user - typically an ORA-3113) who manually resubmits the transaction, and the Oracle Client retries the connection. The trick is for the connection pool which was established by the app server through the F5, to be redirected by the F5 to the Data Guard Standby Database which has assumed the role of the Primary Cluster, and do so without the connection pool which was originally created from the app servers being required to be reinstantiated, as this would require manual intervention, and potentially a restart of the app servers again, and back to the same problem. Generally, I would expect one or more app servers to point to one F5 in groups of n, with n F5's providing a load balancing and redundancy for the app server pool (again 100 + app servers). Because of the nature of the application, it's OK for the end user from their web interface to get an error, and have to retry the connection, but not completely reenter all of the data before retrying to submit it. It's a real interesting architecture challenge for me as I haven't tried this type of leveraging of the F5's, and I can see a real long term benefit to other clients from this in managing app server farms that connect to an Oracle database, or database cluster which is protected by a local standby database configured for Fast Start Failover.

The one thing I have to avoid is having to repropogate the DNS which my research seems to indicate would not be required, but I just don't know the workings of the F5 to be able to explain it, or diagram/configure it. There's a joint BEA/Oracle Whitepaper at http://e-docs.bea.com/wls/docs92/cluster/config_F5_in_MAN_WAN.html that seems to support the ability of a "rapid application failover" that could occur in minutes without dropping the connection pool.

Also, all Oracle Data is mirrored real-time between the clusters so there's no issue there. The key is simpling not dropping the connection pool which is incoming from the app servers, and transparently repointing it to the standby clusters VIP service when the F5 detects the loss of communication with the primary cluster.

I've been around the Oracle technology stack for nearly 24 years, and am co-authoring an Oracle Press book "Oracle Data Guard Handbook" to be release sometime before July, but the configuration capabilities of the F5's is something I just don't work with so have to holler help

Again, thanks for your feedback and help.

Bill
Mike Schrock
Mike Schrock
Post Count: 14
Active Member


--
02/26/2009 10:09 AM  
Hi Bill,

We would like to work directly with you on this out of F5's Oracle Product Mangement Engineering and Solution Engineering teams. We need to fully understand your needs as we have seen similar requests and do not have any public solution yet.

Will you please email Randy Cleveland r.cleveland@f5.com our Director of Solution Engineering and myself m.schrock@f5.com.

Thanks,
Mike Schrock
F5 Oracle Alliance and Solution Engineering Manageer



hoolio
hoolio
Post Count: 11053
MVP - 9


--
02/26/2009 10:16 AM  
Good. Here are the BIG-IP experts for this

If you don't mind, could you post updates/SOL links when you have some?

Thanks,
Aaron
Mike Schrock
Mike Schrock
Post Count: 14
Active Member


--
02/26/2009 10:24 AM  
Absolutely, that is intent of reaching out to work more direct to you. We will share all solutions with others and in this forum.
OracleGuru
OracleGuru
Post Count: 5
New Member


--
02/26/2009 11:14 AM  
Hi Mike, I will do that when I get back to Dallas later this evening or tomorrow (depending on how late American is this week) from my client site.

Bill
OracleGuru
OracleGuru
Post Count: 5
New Member


--
02/26/2009 11:15 AM  
Aaron,

I'll try and put it into an architecture diagram and solution once we have it figured out and make it available.

Bill
Chris Akker
Chris Akker
Post Count: 27
Active Member


--
10/16/2009 02:40 PM  
Hi Bill, we did get several questions about this at Openworld this last week, and we are in the process of getting a solution tested and documented. There is a high level explanation of what we intend to do on a video shot at Openworld on the F5 Idea Board. The example talks about a RAC node failure, but the same principle could be applied to a cluster failover as well.

Check it out at:

http://devcentral.f5.com/weblogs/dctv/archive/2009/10/13/oracle-openworld-a-rac-connection-management-solution-with-chris.aspx

Let us know your thoughts.
Yibin
Yibin
Post Count: 1
New Member


--
04/07/2010 01:47 PM  
Our application using ODP.Net and connection pooling. If Oracle FAN is not configured, will a system with F5 work as a failover tool?

Thanks,
Yibin
Chris Akker
Chris Akker
Post Count: 27
Active Member


--
05/11/2010 11:33 AM  
Yes, in reading the ODP.net specs, the connection pool would be made to a BigIP virtual server, and you can control the timeout values of those pooled connections using a TCP profile on the virtual server. So for example, you want the connections held open for an hour, you would create a TCP profile with a timeout value of 3600 seconds, and apply that to the TCP client side of the virtual server. Now for the database servers, which are in a pool, there are several options for when a pool member goes down. This setting is called "Action on Service Down", and the default is to do nothing, so you would want to change this to "reject", so the bigip will reset all client side connections immediately. This is done in the UI under Pool Advanced properties in Version10. Then these connections, that were connected to that server, would be given instant notice that the server is unavailable, so they will retry their tcp connection. The Big-IP will automatically select another database server, giving you the quick failover/re-connect you are looking for. You could also try the ReSelect option - depending on your application, this may work. More info:

Action on Service Down
Specifies how the system should respond when the target pool member becomes unavailable. The default is None.

•None: Specifies that the system does not select a different node. Selecting None causes the system to send traffic to the node even if it is down, until the next health check is done.

•Reject: Specifies that the system sends an RST or ICMP message.

•Drop: Specifies that the system simply cleans up the connection.

•Reselect: Specifies that the system selects a different node. Selecting Reselect causes the system to send traffic to a different node after receiving the message that the original node is down.

Please login or join DevCentral to post a reply.

  

93,050 Members in 191 Countries and Growing!

Join DevCentral Today!

About DevCentral

F5 DevCentral is your source for the best technical documentation, discussion forums, blogs, media and more related to application delivery networking.

So dive in, meet your peers, and get familiar with DevCentral. We hope it makes your job easier and helps you get more from your F5 investment. If new to DevCentral, check out the Getting Started section. And if you have any problems, or think something could be easier to use, let us know.

Got It !

We've received your comment and transmitted it directly to DevCentral HQ.

Thanks for taking time to let us know what's on your mind. At DevCentral | Community Matters!

Get In Touch With Us

Have questions, suggestions or just want to get something off your chest?

Use our handy form below to Direct Connect with DevCentral Mission Control.

Send Us Feedback      or