Pool Member Maintenance Process
I got a call today from a developer letting me know he would like to perform some maintenance on a pool of about 5 servers. So I spent some time reading over the documentation regarding pool member and node statuses. I setup a packet capture on the unit active for the traffic group. And began trying combinations of statuses to confirm their behavior. My goal is to be able to "flip a flag" on a pool member or node and have TMOS re-associate exiting connections with a different pool member without exception.
Environment
- 3900 BIG-IP 11.6 HF5 Active/Standby
- One virtual named http_vs, one pool named http_pool, five nodes named by IP address all members of http_pool.
- The http_vs virtual has a default persistence profile of cookie with no fallback.
From here on out I'll talk about just one of the http_pool members, 192.168.0.10.
I had the following tcpdump running on the active unit, tcpdump -s 0 -nni 0.0 host 192.168.0.10 and tcp port 80. I redeadly reviewed the persistence table with tmsh show ltm persistence persist-records node-addr 192.168.0.10 node-port 80. As expected with cookie persistence, I never saw any records.
I executed each of the following commands and monitored the behavior:
- tmsh modify ltm pool http_pool members modify { 192.168.0.10:80 { state user-down } }
- tmsh modify ltm pool http_pool members modify { 192.168.0.10:80 { session user-disabled } }
- tmsh modify ltm pool http_pool members modify { 192.168.0.10:80 { session user-disabled state user-down } }
- tmsh modify ltm pool http_pool members delete { 192.168.0.10:80 }
- tmsh modify ltm node 192.168.0.10 { state user-down }
- tmsh modify ltm node 192.168.0.10 { session user-disabled }
- tmsh modify ltm node 192.168.0.10 { session user-disabled state user-down }
The pool member had about 400 active connections at the time each command was executed and in each case it took about 15 minutes for all connections to expire. I verified by executing the following about every 30 seconds, tmsh show sys connection ss-server-addr 192.168.0.10 ss-server-port 80. Most of this behavior is expected based on the documentation which is good. I was however surprised that removing the node from the pool didn't force TMOS to make a new load balance decision.
I thought about deleting the existing connections after removing the node from the pool with tmsh delete sys connection ss-server-addr 192.168.0.10 ss-server-port 80 but I don't think this is going to be transparent with the client.
I looked around for an already vetted process to perform but the topic is pretty sparse. Ultimately what I'm after is a way to administratively reproduce what happens when a monitor marks a pool member down. I would like existing connections to re-associated with a different pool member.
I'll continue to keep looking but I was curious what others are doing in cases like this. I can also see this beneficial of a pool member is still online but experiencing some layer 7 problem not detected by a monitor i.e. some what to stop all traffic to a pool member without the client being aware of the problem. Or am I asking for the holy grail?
Thanks in advance for given my situation some thought.