Forum Discussion

Vince_Beltz_959's avatar
Vince_Beltz_959
Icon for Nimbostratus rankNimbostratus
Apr 29, 2010

VIP Persistence

I have another interesting request from our devs. Is it possible to do "VIP persistence"? Their objective is for a user that has successfully connected to a server in a VIP pool to be transparently re-connected to another server in the same pool if the server to which they originally connected goes down. The client-to-VIP-IP side connection would be unaware that anything at all had happened.

 

 

Having trouble getting past the first page of forum search results for some reason - trying to hit any other page (either directly or with the next button) times out.

 

9 Replies

  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    In theory, i think it is... At the tcp layer... (Because at the end of the day, the F5 is a proxy, not a router).

     

     

    HOWEVER! If you expect things like authentications and state to be replayed you'd have to do a bit of work with iRules... SOme protocols it would be relatively easy... Some very difficult... YMMV.

     

     

    H
  • Ok, where would I start (for say, client-side HTTPS)? Not even sure what the proper terminology to search the forums for is - assuming that search results past the first page are working for me today.
  • The functionality that you are looking for is an available option on the Pool.

     

     

    Action on Service Down: Specifies how the system should respond when the target pool member becomes unavailable. Options are:

     

     

    - None (Default): Specifies that the system does not select a different node. Selecting None causes the system to send traffic to the node even if it is down, until the next health check is done. This is the default action.

     

    - Reject: Specifies that the system sends an RST or ICMP message.

     

    - Drop: Specifies that the system simply cleans up the connection.

     

    - Reselect: Specifies that the system selects a different node. Selecting Reselect causes the system to send traffic to a different node after receiving the message that the original node is down.

     

     

    This setting will work for anything that access the or uses the Pool that you assign it to.
  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    Be aware however that this may still not be seamless... Your client may still lose the last response... So if the reselect was taken after the client request, but before the response (or half way through the response etc) then the results may not be as you expected... (A stateless app may not notice. YMMV).

     

     

    If you're interested, there's a DC TV discussion between Deb & Colin at http://devcentral.f5.com/weblogs/dctv/archive/2008/05/15/3267.aspx that runs through restarting a request. They talk about various scenarios, and whether iRules may be appropriate, and what each of the actions will result in.

     

     

    H
  • Thanks for the "Action on Service Down" tip (real D'oh! moment there). However, we're having trouble getting the LTM to actually Reselect. We've tried the Disabled/Forced Down node settings, as well as writing an ECV that allows us to flip the nodes in/out of service. But in all of these cases, we're still seeing HTTP connections to the out-of-service pool member (as viewed in the Pool statistics) for quite a while (hours, in some cases). How can we *force* the Reselection to happen on demand, moving those connections over to a different Pool member?

     

  • So, after another talk w/the devs this morning...

     

     

    Expanding on my previous message, none of the "Action on Service Down" settings have done exactly what we need. "Reject" did force the connections to another pool member, but sending the RST to both the client and server didn't provide the user-transparent experience we were looking for. With "Reselect" connections to the Disabled server didn't close for hours.

     

     

    We're still looking for what I tried to describe at the start of this thread - a way to maintain the VIP/client-side connection (HTTP-Keepalive is currently in use), while allowing connections to be rebuilt on the Pool/server-side to be moved as necessary. We want to rotate hosts out of the pool for maintenance or other reasons, without having to wait as long as a few hours for the connections on them to clear.

     

     

    Is it possible to issue the RST server-side only, or otherwise trigger a rebalance without touching the client-side?

     

     

    If not on-demand when a host fails ECV, could there be a way to schedule rebalance across the pool at regular intervals?

     

  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    Have a look at the LDAP Proxy iRule in the codeshare. It demonstrates an LDAP proxy (An HTTP version I think is what you're really after for your case). It shows how to reconnect to another poolmember and continue the client-f5 connection without the client knowing about it.

     

     

    You'll have to rip out the LDAP binding of course, and intercept the poolmember failure as well... (Because the LDAP Proxy iRule swaps back & forth between read-only and read-write backends, it doesn't account for server failure).

     

     

    I think you'll want to look at using the SERVER_CLOSED event to detect the server side connection going down. And then reselect a new pool member in there.

     

     

    H
  •  

    We're still looking for what I tried to describe at the start of this thread - a way to maintain the VIP/client-side connection (HTTP-Keepalive is currently in use), while allowing connections to be rebuilt on the Pool/server-side to be moved as necessary. We want to rotate hosts out of the pool for maintenance or other reasons, without having to wait as long as a few hours for the connections on them to clear.

     

     

     

    You are asking for several different things that are mutually exclusivie:

     

    Node Failures can be handled with the Action on Service Down. (Action on Service Down mainly effects existing connections to the server going down.)

     

     

    Rotating Nodes in and out of a pool without allowing active or persistent connections to remain established and slowly die away as active sessions disappear is the only non-disruptive way of freeing a server. All other methods are intrustive or disruptive.

     

     

    If you look at the available Pool Options, none of the sever connections:

     

    - Enabled (All traffic allowed)

     

    - Disabled (Only persistent or active connections allowed)

     

    - Forced Offline (Only active connections allowed)

     

     

    If you are not wanting to wait then it will require manual intervention on your part by either stopping the website, or fooling the F5 into believing that the server has "Failed" and executing a recovery (most likely with Action on Service Down).

     

     

    My suggestion would either be stopping the website or by applying a Health Check that you know will fail (causing the F5 to change the Member Status to Offline).

     

  • Thanks for the LDAP proxy suggestion, Hamish - I never could have written that code, but I'm reasonably sure I can modify it. :-)

     

     

    Michael, we'd tried creating a Health Check that intentionally failed, triggering a Action on Service Down. Our problem was "Reselect" wasn't doing anything about existing connections that took hours to clear. "Reject" did, but in a way that caused the user's browser session to display errors that we want to avoid. I know we're looking at intrusive/disruptive actions here, I'm just trying to avoid letting the user see them.