Forum Discussion

LmenD's avatar
LmenD
Icon for Nimbostratus rankNimbostratus
Jan 30, 2019

Best way to quickly drain web server connections

I have application teams complaining that client connections do not drain fast enough from their web servers to allow them to push out code changes quickly and efficiently. They state it can take upto 10 minutes before they see client connections start to fall off once they force a health monitor failure. For this specific instance, they are 4 servers each with 800 - 1000 concurrent connections.

 

From other people's experience, is there a good way to get those connections to drain faster?

 

 

Update

 

There is no persistence profile configured.

 

When the application teams force the health monitor failure, they do so by renaming or deleting the web page used for the health monitor. No configuration changes are actually being made to the F5.

 

5 Replies

  • wlopez's avatar
    wlopez
    Icon for Cirrocumulus rankCirrocumulus

    Are you doing 'Disable' or 'Force Offline' on the pool members?

     

    'Disable' won't allow new connections but will keep current ones until they time out or are closed by the user.

     

    'Force Offline' will not allow any new connections and tries to kill all current connections immediately.

     

    Another more drastic method (not usually necessary) is to remove/delete the pool member and add it again once it's ready to receive traffic again. You'll lose previous statistics for the pool member with this method.

     

  • When the monitor disable string is used, it sets the pool member to disabled which doesn't terminate current connections, it just prevents new ones. It seems like this is what is causing the delay in users drop off rate. The only way to speed it up would be to do something that terminates user connections early or to delete the persistence record.

    It is possible to delete the persistence record when the monitor is manually disabled which would cause the F5 to reselect a pool member almost immediately, such as in this iRule:

    when LB_SELECTED 
    {  
        if {[LB::status] eq "session_disabled"}
        {
            persist delete source_addr [IP::client_addr]
        }
        
    }
    

    The most optimal solution for user experience is still to let the monitors function as intended, but deleting the persistence record should force users to reconnect to a new available server.

    Hope this helps.

  • Jnon's avatar
    Jnon
    Icon for Nimbostratus rankNimbostratus

    Your persistence rule are a factor here, as wlopez indicated, Disable will stop new connections, but it allows active connections to finish up gracefully. The big difference is, the customer experience, usually you would be more concerned about customer experience than waiting for the customers to bleed off. If your connections are stateless, then you should be able to force them offline, with minimal impact to users. Most likely they are state-full connections and you want them to finish up their business, but one factor that might help, is your persistence timeout value.

     

  • I agree with the point mentioned by @wlopez. As an F5 engineer I will go with the disable option first then fource off-line. We have to minimize the business impact and let customer to finish their financial/businesses transaction. Application team always pass the ball to others net, we should plan accordingly.

     

    • Adriano_Bezerra's avatar
      Adriano_Bezerra
      Icon for Altostratus rankAltostratus

      It has the option Action on Service Down, I´ve used it for a long time and it works very well.

       

      "Action on Service Down" is a Pool setting, and can be found in the GUI (Local Traffic -> Pools). Choose "Advanced" in the Configuration dropdown to reveal the "Action on Service Down" setting.

       

      The possible options are None, Reject, Drop, Reselect

       

      Use "Reject" when you want LTM to explicitly close both sides of the connection when the server goes DOWN. "Reject" is the most commonly used option for the service down setting. This option often results in the quickest recovery of a failed connection since it forces the client side of the connection to close, in many cases triggering an automatic re-connect & re-send of the request in process.

       

      Use "Drop" when sending a RST to the client is not desirable. This method does not immediately reflect the server's state change to the client, and depends on the client to close or otherwise manage the connection.

       

      Use "Reselect" when the client can continue with a new server seamlessly. The request in play at the time of state transition may be lost, so the client will need to be able to recover gracefully to use this option successfully.

       

      Use "None" if you don't want LTM to intervene in managing either side of the connection. Useful if your servers may not be accepting new connections, but should be allowed to continue servicing existing connections when marked DOWN. Also supports custom monitoring designed to support connection bleeding and other non-standard state management schemes.

       

      In environments that I set up I usually use the Reject option, however, each one should analyze what fits best in their environment.