Here's my story. I have several hundred long-lived client TCP connections to two Tibco RVD servers. (don't ask why we're not just using RVD in a multicast arrangement. Long story.) Clients can connect to either RVD server. It doesn't matter. The connections are made at the beginning of the day and not terminated until the evening. If one of the servers should go down, i'd like to reconnect the client connections to the other server, preferably without the client ever knowing it was disconnected. So, the thought was to insert an F5 between the client and servers. The F5 would act as the "server", so a TCP connection would exist between the client and F5, then the F5 would maintain a TCP connection between itself and the "real" RVD servers. OneConnect sounds similar (i.e. F5 acts as a TCP proxy) but seems to be used for pairing new client connections to existing backend connections. That part is cool. What I really need to do is detect backend server failure, through a healthcheck, irule, or whatever, and force the clients connected to that failed backend server to reconnect to the other server, seamlessly if possible. I could use some help figuring this out. Thanks. Dave

Hi David, Sounds like you need to setup a typical VIP, Pool scenario. Have you looked at the configuration guides on ask.f5.com? Bhattman

Thanks. I did setup a typical VIP, with a typical pool, but this isn't a typical scenario, asfaik. I need to make the whole thing fault-tolerant. I've confirmed that client connections round-robin between the two rvd servers, as I knew they would. However, if I kill one of the rvd servers, the clients attached to that rvd immediately terminate as well. What I want to accomplish is to detect, or intercept, the SERVER-side failure before the CLIENT-side TCP connection is closed and somehow get the client connected on another SERVER-side connection. I'm not sure if this is possible. If not, i'd like to know what, if any, of my options are for high availability of long standing TCP sessions. I'm hoping there are options. The VIP is setup with SNAT and OneConnect is set to use the standard oneconnect profile. I'm using the round-robin algo. I implemented an iRule with stubs and log0. entries in each stub just to see what gets called when. When I kill the server, I see the client terminate before I see any LTM messages. And the first message I see is the health check message telling me my server died, but by then it's too late. The client has already terminated.

Hi David, It sounds like you want to reselect a new pool member if the serverside connection is closed before the clientside connection is. As you found, a health monitor and the 'action on service down' pool setting would happen too late to have any reliable effect on a single TCP connection. Ideally, you'd be able to use an iRule which calls LB::reselect from the SERVER_CLOSED event. However, I'm not sure this is supported or would work. The wiki page for the SERVER_CLOSED event (Click here) doesn't show much hope as the LB::reselect command isn't listed there. A quick test shows the syntax parser doesn't allow it. And when you bypass the parser, it doesn't seem to work. Here is an apparently non-working example: when CLIENT_ACCEPTED { Try to reselect a server set reselect 1 log local0. "[IP::client_addr]:[TCP::client_port]: New connection to [IP::local_addr]:[TCP::local_port]" } when LB_SELECTED { log local0. "[IP::client_addr]:[TCP::client_port]: Selected server info: [LB::server]" } when SERVER_CONNECTED { log local0. "[IP::client_addr]:[TCP::client_port]: Connected server info: [IP::server_addr]:[TCP::server_port]" } when SERVER_CLOSED { log local0. "[IP::client_addr]:[TCP::client_port]: Server connection closed" if {$reselect}{ log local0. "[IP::client_addr]:[TCP::client_port]: Trying reselect" set lb_cmd "LB::reselect" eval $lb_cmd } } when CLIENT_CLOSED { Do not try to reselect a server if the client closed the connection first set reselect 0 log local0. "[IP::client_addr]:[TCP::client_port]: Client connection closed" } Assuming this won't work, the best I can think of are ways to reduce the chance that LTM will close an idle connection. These configuration options are related to idle timeouts on the TCP and SNAT profiles: SOL7606: Overview of BIG-IP LTM idle session timeouts https://support.f5.com/kb/en-us/solutions/public/7000/600/sol7606.html Aaron

Thanks Aaron, I'm not feeling warm and fuzzy. Can we load balance actual TCP traffic? Or just new connections? If the answer is "just new connections", then we're going to have to return our two brand-new 3900 series load balancers because they're not going to do anything for us. The sales rep. claimed this was easy stuff. I can write a C/C++ program to do what we want, which is to open a TCP listener on one side and sockets to two RVD servers on the other side, and round-robin data between the two servers. We chose F5, instead of doing that, because we've got experience with them (i.e. we have a trust relationship from other projects we've used F5 on), and because of F5's HA model (i.e. the F5 isn't a single point of failure). What can I do? I'd like to review all possible solutions before giving up on F5. Thanks. Dave

You might get warm and fuzzy from an F5 salesperson. I'm just providing the best technical suggestions I can think of based on the requirements you've described. If you're considering returning hardware because a salesperson said the F5 kit could do something and it doesn't seem to be working, I'd suggest you get in touch with someone at F5 to get an official F5 response. I'm not an F5 employee and Devcentral isn't a place to get official F5 responses. I don't have any vested interest in you keeping the gear and won't come up with a better solution than what I've already suggested based on you possibly returning the hardware. If you do figure out a better solution, could you reply back with what you figure out so we can reference it in the future? Thanks, Aaron

Fault Tolerant long-lived TCP connections

13 Replies

David_Bradley_2
Nimbostratus
Mar 10, 2010
Typo. I meant "I can't just simply let the load balancer pick a new target".
David_Bradley_2
Nimbostratus
Mar 12, 2010
Thanks. That was my concern too. I've been looking at ethereal outputs of some rvd chat and think i've got a basic understanding of the transactions involved. Of course I don't know what I don't know. I think I could do what you're saying. i.e. queue a copy of each request until I see a response. If no response within a certain timeframe, then choose a new backend server, relogin, and resend the request. If I could pull this off, then i'm safe from client data loss. But the other issue is this. I noticed last weekend was that as soon as the server is shot in the head, the client immediately disconnects. No iRule events happen in between the server death and the client disconnect. So there appears to be no way to "catch" the server death and reattach to another running server before the client gets killed. This wreaks havoc on the applications using RVD. I need to get a network trace of everything involved in this scenario and see if I can figure out what the client is waiting for, but not getting, that is causing it to timeout and disconnect. Let me ask you this: On the TCP connection between the client and the LB, (assuming a ONECONNECT setup), does the LB immediately ACK the TCP packets to the client? Or does the backend (server side) ACK get sent back?

Thanks again for your help.

Dave
David_Bradley_2
Nimbostratus
Mar 17, 2010
I'm experimenting with an irule to see what's possible. I've taken a close look at tcpdump output from a simple connection to an rvd and a single message being sent from the client to the rvd server. I used that to build an irule that "collects" the preamble packets that lead up to the "connection" response by the rvd. My thought was that I could replay these to the other rvd server and get an RVD session established with both. Then I could drop rvd messages to either one. At this point I need to be able to LB::reselect, or something, to break the server-side TCP connection to the first rvd server and get it pointed to the other rvd server, while leaving the client-side TCP session intact. LB::reselect seems to be callable only from LB_SELECTED or LB_FAILED. So then I figured I could just LB::detach in order to detach the server-side and wind up in one of those events where I could do an LB::reselect. But I don't see either event ever firing. What am I missing? Is there a better way to do this?

Thanks.

Dave

Forum Discussion

Fault Tolerant long-lived TCP connections

13 Replies

Recent Discussions

Converting config to DO

iRule resulting in too many redirects

Wildcard SSL Certificate Deployment on F5 LTM

URL

Big-IP Next 20.2.0-2.375.1+0.0.43 iRule count problem

Related Content

Securely connecting Kubernetes Microservices with F5 Distributed Cloud

Troubeshooting website connection

F5 Connection Mirroring question

The Perimeter Is Dead! Long Live The Perimeter!

DevCentral Connects hosts Capture the Flag!