I have the F5 in front of a couple of Redis servers, configured ad Master/Slave.
I have the READ pool, with round robin on both servers.
I have the WRITE pool with one server (master) with priority 100 and the slave with lower priority. When the master is down the slave kicks in (it is configured to be writable). I do not expect to have the keys that go into the slave replicated to the master when the master comes back. This is intended.
I have a health check based on the application PING on Redis servers. It works as expected in identifying dead hosts.
The problem is that my app server do not behave as expected.
When everything is up the app writes to the WRITE pool, and this results in keys to go to the master. Keys are replicated to the slave and so GET commands when sent to the READ pool are split between the master and the slave.
When master goes down F5 detects it, and if I open a redis-cli on the balanced address I connect to the writable slave and it all works well. The application instead goes in "retry" mode while issuing commands to the F5. and no keys are written on the slave host. Recycling the application registers on the slave host, but again commands are not forwarded. Even if I bring back the master no SET commands are sent to it. They simply disappear and the application gets "retry". Sniffing the servers the DO NOT receive any command.
If I bring down the slave it all works well. Then I bring down the master also and the app fails (expected). When I bring up again the master the app restarts sending commands to the master.
To sum it up, during normal work: F5 detects the master going down, but the connections are NOT automatically redirected to the slave. New connections from the redis-cli are, but the app does not work even after recycling. After the recycle, if the master comes back up, connections are not forwarded to it. Closing and reopening the redis-cli works as expected going to the master.
It seems connections are stuck to the dead server and do not switch.
It is a FastL4 balancing.
Any suggestion on the way to cleanly switch between the backend servers?
I think "Action on service Down" inside pool configuration could help you to move active connections to the backup node:
Specifies how the system should respond when the target pool member becomes unavailable. The default is None, meaning that the system takes no action to manage existing connections when a pool member becomes unavailable.
None: Specifies that the system maintains existing connections, but does not send new traffic to the member.
Reject: Specifies that, if there are no pool members available, the system resets and clears the active connections from the connection table and sends a reset (RST) or Internet Control Message Protocol (ICMP) message. If there are pool members available, the system resets and clears the active connections, but sends newly arriving connections to the available pool member and does not send RST or ICMP messages.
Drop: Specifies that the system simply cleans up the connection.
Reselect: Specifies that the system manages established client connections by moving them to an alternative pool member when monitors mark the original pool member down.
Thank you Amintej. I will setup another debug session with the devs and let you know. I suspect reselect can help.
Here I am again. Just run another set of tests with the devs. None of the 4 settings changes the behaviour. When the master host goes down, having the app connections live, the next requests simply don't get routed to the other node. If I open a new redis-cli session, this is connected to the right host. Basically the "Reselect" action acts like the default "none". For the sake of testing, I also used "drop" and "reject" with the same results.
The problem is that also the "drop" option can be acceptable: an exception should fire up in the code and cause a reconnect, forcing the switch to another active server but this is not happening at all.
Basically once I'm connected to one server, I keep with it even if it is going down. New sessions are routed as expected.
Hello, you mean app is not reconnecting properly, right ?
The app does not know it has to reconnect, so it simply does not even try. If I understand correctly the "Reselect" option should silently move connections already open to the other host. "Reject" should reset the connection telling the app servers to create a new connection.
What happens is that using any of those settings the connections are NOT reset, NOR routed to the secondary server when the first fails.
This way the server has no clue and continues to send commands on the dead connections. If I sniff the net on the primary server where I stopped Redis, I see packets from the F5 coming in even if it is red/unavailable in the F5 console.
NEW connections work as expected, they are routed to the green/up server.
To clarify, these are the steps:
1. Both Redis Up
2. App server starts and creates 4 connections
3. F5 routes those to Redis1
4. App server writes/reads keys to the Redis server.
5. Redis1 goes down
6. F5 marks the node as down (this is correct)
7. NEW connections are routed to Redis2 (this is correct)
8. Existing connection from step 2 are NOT routed to Redis2, they continue to hit the dead Redis 1 server (Reselect, Reject and None action, all the same). This is NOT correct.
9. Redis1 comes up
10. F5 confirms is up and green bullet
11. Connections do NOT flow back to Redis1
In this way the failover does not work, and the failback also does not work. Loosing one Redis breaks the application. Having the app connected directly to Redis1 would be better than having this setup because when it is back up all starts working again.
to force one member available, configure destination address persistence with indefinite timeout.
if will force all connections to be forwarded to the same server and no fallback to higher priority member.
Configure it with pool properties Action on service Down reset to force tcp close the server is unavailable and send reset to client.
Action on service Down reselect won't work with tcp connection. you must reset the tcp connection to force the client to open a new one.
I have some question and very helpful get some advice.
my client request redis server LB and persist.
redis servers are over 30s.
I have LTM and find redis profiles but I couldn't find.
when applying universal persistence can solve.
but which value is proper to persist redis?
client IP and environment is very variable.