F5 Big IP for App Failover between Oracle Data Guarded Clusters

Question

I've got a situation where the loss of a RAC Cluster requires the restart of all application servers to repoint them to the Oracle Data Guard Failover Cluster.  In this case there are hundreds of application servers, all with connection pools.  The time to failover the Oracle Database 10gR2 RAC Cluster to the standby is in minutes before it is fully available, and could be quicker.  The application servers don't start returning to online status for at least 1 1/2 hours to as long as three hours as they must be completely shutdown and manually restarted (I know).  My objective is to have an architecture where the connection pools are simply rerouted to the Data Guard Failover Cluster which is available within minutes, and if fast start failover is used, virtually immediately.  F5 Big IP equipment is used in the DC and I'm looking for architeture/configuration suggestions on how best to approach this to make the application failover seamless between a failed primary cluster, and a Data Guard standby cluster which assumes the role of the primary cluster on failover.  This is a new architecture we're moving towards, so anything is basically on the table. 
&nbsp;  
&nbsp; TIA 
&nbsp;  
&nbsp; Bill

hooleylist · Answer

Hi Bill, 
&nbsp;  
&nbsp; I thought RAC should provide seemless failover between cluster nodes.  Are you trying to handle the scenario where the complete cluster goes down? 
&nbsp;  
&nbsp; It would be good if someone here with more experience with Oracle commented as well, but in concept, I think you could configure a single pool with two groups of members.  You could use priority group activation to ensure application requests go to the higher priority group first.  If that group isn't available the lower priority members would be used.  You could configure an Oracle DB monitor to check if the pool members are up. 
&nbsp;  
&nbsp; If you want LTM to send a reset to the client if the selected pool member goes down, you can configure this using the 'Action On Service Down' down option in the pool properties. 
&nbsp;  
&nbsp; Aaron

oracleguru_6934 · Answer

Hi Aaron, 
&nbsp;  
&nbsp; You're right the failover of nodes in a RAC 10g and later cluster is seamless.  The ability of the app to seamlessly failover from node to node depending on how it's connectivity is configured may be another matter, but my question is purely focused on cluster to cluster failover.  Consider two RAC clusters local to each other, one the primary, the other a local standby, i.e. same data center, and a 3rd cluster in a remote data center.  Oracle's Data Guard will handle the failover of the Oracle RAC Cluster to either the local standby or the remote standby.  The preferred failover target is the local standby, backed up by the remote standby.  My challenge here is how to seamlessly repoint the 100+ application servers from the primary cluster to the local standby and back again without having to do any type of a restart of the app servers, or bounce of the connection pools from the app servers.  I've used the F5 BIG-IP and Cisco Catalyst for load balancing of app server connections to my cluster nodes when a thin java client was in place, but this is different in that their current application as it exists today, can't manage the persistence or caching of transactional data.  As a result, in the event of any complete cluster failure, they are shutting down every app server and manually repointing them after the standby database cluster is up and running.  We could use FAN and FCF and an application API to dynamically manage the failover, but their app won't support it.  Hence, I'm looking outside the box at alternatives to enable a "rapid application failover" that can execute in minutes without dropping the connection pools allowing the app servers to "simply" failover to the standby database cluster as the primary database cluster fails over the same standby database cluster. 
&nbsp;  
&nbsp; It's not relevant in a remote DR scenario as a complete duplicate set of app servers exist which will have to be brought up anyway in the event of a primary "site" failure. 
&nbsp;  
&nbsp; HTH clarify what I'm trying to architect. 
&nbsp;  
&nbsp; Again, thanks to anyone who can shed some insight on how to configure the F5's to support this. 
&nbsp;  
&nbsp; Bill

hooleylist · Answer

Hi Bill,  
&nbsp;  
&nbsp; Thanks for the explanation.  I'm not an Oracle expert by any means, so that was useful.  
&nbsp;    
&nbsp; If the app client was configured only to open connections to the LTM VIP, LTM could select a new cluster if the primary first cluster went down.  This would provide near-seemless resilience at the TCP layer.  But do the other clusters have real time mirroring of the first cluster's data?  If so, it seems like the failover between clusters could be seemless at the TCP and app layer.   
&nbsp;  
&nbsp; If the data isn't mirrored between clusters, what would the app client need to get in response from the server end (either LTM or the database) to tell it to restart its session because the existing cluster died?  Would a TCP reset suffice?  Or a SQL level message? Or something entirely different? 
&nbsp;    
&nbsp;  Aaron

mike_schrock_61 · Answer

Hi Bill, 
&nbsp;  
&nbsp; We would like to work directly with you on this out of  F5's Oracle Product Mangement Engineering and Solution Engineering teams.  We need to fully understand your needs as we have seen similar requests and do not have any public solution yet. 
&nbsp;  
&nbsp; Will you please email Randy Cleveland r.cleveland@f5.com our Director of Solution Engineering and myself m.schrock@f5.com. 
&nbsp;  
&nbsp; Thanks, 
&nbsp; Mike Schrock 
&nbsp; F5 Oracle Alliance and Solution Engineering Manageer 
&nbsp;  
&nbsp;  
&nbsp;  &nbsp;

mike_schrock_61 · Answer

Absolutely, that is intent of reaching out to work more direct to you.  We will share all solutions with others and in this forum.

Forum Discussion

F5 Big IP for App Failover between Oracle Data Guarded Clusters

9 Replies

Recent Discussions

import live updates from version x to version y

Tenant image upgrade

iRule editor partition button does not work

F5Access | MacOS Sonoma

Overwriting or adding LTM SSL Traffic cert and key using iControlREST

Related Content

Adaptive Apps: Multi-Cluster K8S Workload Migration and Failover Resiliency - Solution Demo

Deploy Failover BIG-IP Cluster with New Stack in Google Cloud using v2 Templates

F5 BIG-IP deployment with OpenShift - multi-cluster architectures

F5 BIG-IP per application Red Hat OpenShift cluster migrations

Deploy Failover BIG-IP Cluster with Existing Stack in Google Cloud using v2 Templates