Forum Discussion

David_Bradley_2's avatar
David_Bradley_2
Icon for Nimbostratus rankNimbostratus
Mar 07, 2010

Fault Tolerant long-lived TCP connections

Here's my story. I have several hundred long-lived client TCP connections to two Tibco RVD servers. (don't ask why we're not just using RVD in a multicast arrangement. Long story.) Clients can connect to either RVD server. It doesn't matter. The connections are made at the beginning of the day and not terminated until the evening. If one of the servers should go down, i'd like to reconnect the client connections to the other server, preferably without the client ever knowing it was disconnected. So, the thought was to insert an F5 between the client and servers. The F5 would act as the "server", so a TCP connection would exist between the client and F5, then the F5 would maintain a TCP connection between itself and the "real" RVD servers. OneConnect sounds similar (i.e. F5 acts as a TCP proxy) but seems to be used for pairing new client connections to existing backend connections. That part is cool. What I really need to do is detect backend server failure, through a healthcheck, irule, or whatever, and force the clients connected to that failed backend server to reconnect to the other server, seamlessly if possible. I could use some help figuring this out. Thanks.

 

 

Dave

13 Replies

  • Thanks. That was my concern too. I've been looking at ethereal outputs of some rvd chat and think i've got a basic understanding of the transactions involved. Of course I don't know what I don't know. I think I could do what you're saying. i.e. queue a copy of each request until I see a response. If no response within a certain timeframe, then choose a new backend server, relogin, and resend the request. If I could pull this off, then i'm safe from client data loss. But the other issue is this. I noticed last weekend was that as soon as the server is shot in the head, the client immediately disconnects. No iRule events happen in between the server death and the client disconnect. So there appears to be no way to "catch" the server death and reattach to another running server before the client gets killed. This wreaks havoc on the applications using RVD. I need to get a network trace of everything involved in this scenario and see if I can figure out what the client is waiting for, but not getting, that is causing it to timeout and disconnect. Let me ask you this: On the TCP connection between the client and the LB, (assuming a ONECONNECT setup), does the LB immediately ACK the TCP packets to the client? Or does the backend (server side) ACK get sent back?

     

     

    Thanks again for your help.

     

     

    Dave
  • I'm experimenting with an irule to see what's possible. I've taken a close look at tcpdump output from a simple connection to an rvd and a single message being sent from the client to the rvd server. I used that to build an irule that "collects" the preamble packets that lead up to the "connection" response by the rvd. My thought was that I could replay these to the other rvd server and get an RVD session established with both. Then I could drop rvd messages to either one. At this point I need to be able to LB::reselect, or something, to break the server-side TCP connection to the first rvd server and get it pointed to the other rvd server, while leaving the client-side TCP session intact. LB::reselect seems to be callable only from LB_SELECTED or LB_FAILED. So then I figured I could just LB::detach in order to detach the server-side and wind up in one of those events where I could do an LB::reselect. But I don't see either event ever firing. What am I missing? Is there a better way to do this?

     

     

    Thanks.

     

     

    Dave