Forum Discussion

Zdenda's avatar
Zdenda
Icon for Cirrus rankCirrus
Nov 25, 2016

Time_wait - why so short?

Hi, got case that pool members have a lot (~40k) connections in time_wait phase since we enabled SSL-offload on LB. During investigation I noticed that windows server has time_wait by default to 4mins which should be 2x MSL (probably normal value), but F5 default value of time_wait is 2s.

 

Why so big difference, why F5 has only 2s? Does anyone know? I have 11.5.4 HF2 running on vCMP.

 

Thanks, Zdenek

 

4 Replies

  • Hi - It's important to understand the TCP timers in F5 - The TIME_WAIT state occurs as part of the active close on the initiator side of the connection when the final FIN is received and acknowledged, or in the case of a simultaneous close, when the acknowledgment to its initial FIN is received. So it not advisable to increase the value of the TIME_WAIT criteria.

     

    I am not reposting some good explanations that I found helpful for me.

     

    TCP Timers

    TCP sets several timers (not all documented here) for each connection, and decrements them either by the fast timer function every 200ms or by the slow timer function every 500ms. Several of the timers are dynamically calculated, but a few are static as well. We’ve already discussed the idle timeout setting, so today we’ll tackle the FIN_WAIT, CLOSE_WAIT, & TIME_WAIT settings. Reference these diagrams as you read through the timer settings below. The diagram on the left represents a standard tcp close, and the the one on the right represents a simultaneous close.

     

    FIN_WAIT

     

    There are actually two FIN_WAIT states, FIN_WAIT_1 and FIN_WAIT_2. In a standard close, the FIN_WAIT_1 state occurs when the initiator sends the initial FIN packet requesting to close the connection. The FIN_WAIT_2 state occurs when the initiator receives the acknowledgement to its FIN and prior to receiving the FIN from the responder. In a simultaneous close, both sides are initiators and send the FIN, creating the FIN_WAIT_1 state on both ends. Upon receiving a FIN before receiving the ACK from its FIN, it immediately transitions to the closing state. In the LTM TCP profile, the FIN_WAIT setting (in seconds) applies to both the FIN_WAIT and the CLOSING states, and if exceeded will enter the closed state. The default setting is five seconds.

     

    CLOSE_WAIT

     

    Whereas the FIN_WAIT states belong to the end of the connection initiating a close (called an active close), the CLOSE_WAIT state belongs to the end responding to a close request (called a passive close). The CLOSE_WAIT state occurs after a responder receives the initial FIN and returns an acknowledgement. If the responder does not receive an acknowledge from its FIN to the initiator before the timer is exceeded, the connection with enter the closed state. Like the FIN_WAIT state, the default setting is five seconds.

     

    TIME_WAIT

     

    The TIME_WAIT state occurs as part of the active close on the initiator side of the connection when the final FIN is received and acknowledged, or in the case of a simultaneous close, when the acknowledgment to its initial FIN is received. The default setting is 2000 milliseconds, so connections entering the TIME_WAIT state will enter the closed state after 2 seconds.

     

    TIME_WAIT Recycle

     

    This setting when enabled will signal the LTM to reuse the connection when a SYN packet is received in the TIME_WAIT state. If disabled, a new connection will be established.

     

    • Zdenda's avatar
      Zdenda
      Icon for Cirrus rankCirrus

      Thanks, but this still does not answer to my question, why F5 uses just 2s by default for time_wait

       

  • I am not sure why F5 picked 2s but having a lower time_wait state will enable quicker shutdown of the sockets and hence, will enable the creation of more TCP connections within a specific time window. Basically, it increases the ability to handle higher scale (more connections within a time window).

     

    The time_wait of 2 minutes works when the internet was in its infancy - slow and smaller scale. This doesn't work well for the scale and speed of modern internet.

     

    time_wait usually helps in making sure that there are no delayed packets/duplicate packets for a TCP connection that is closed and could be interpreted as part of a newer TCP connection.

     

    For the question on why 2s and not say, 4s - I am not sure.

     

  • Firstly you need to understand that any default setting, for anything, will never fit all scenarios. I don't know why F5 Product Development has chosen 2secs, but I can give you a some technical explanation why I think that is correct.

     

    First you need to understand that TCP RFC 793, is from 1981. At that time the network speeds, and quality, was a lot different than today.

     

    https://tools.ietf.org/html/rfc793

     

    “For this specification the MSL is taken to be 2 minutes. This is an engineering choice, and may be changed if experience indicates it is desirable to do so.”

     

    If you read this link, you will get a very good explanation about why TCP needs time_wait.

     

    http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html

     

    The 2 reason you see in the above link, are main related with the delay in receive packets. If you compare the speed in 1981 with the speeds now, I am sure you agree they are different. So, we can’t use a default setting defined in 1981, for networks we have in 2016.

     

    If you search for Linux tcp time wait, you will see that Linux have some kernel settings to deal with that, and most Linux admin guides will tell to customize the settings for time_wait. The default msl is 60 sec, so time_wait 2min.

     

    Microsoft also have solutions about that, and the solutions says that are benefits in reduce the time_wait.

     

    https://technet.microsoft.com/en-us/library/cc938217.aspx

     

    F5 still a network device, that is used by different customers, and different networks. Your can’t have a 4 min time_wait for example in a ISP network, as you will either run out of ports or memory in the F5 unit. The same way that other TCP default settings will not work well for a ISP network.

     

    Also, my expectation is that a connection that is created in the server uses a lot less memory than a connection created in the F5. Because F5 would probably allocated a lot of more information about that connection, as it will allocate not only TCP information. For any very busy device, if you increase the time_wait to a very large number, you will end up either without available ports or without memory for new connections.

     

    Anyway, this is to try to give you an idea why the 2 secs, but you can simply create a new TCP profile in F5 unit with different time wait (from 0 to 600 seconds) and apply that to the virtual servers that connect to those windows servers. You will probably see an increase of memory over the time.

     

    I prefer the option of reduce the windows server to the minimum (30sec), and create a new TCP profile with 30sec in the F5. This is the option I have used in the pass for this type of issue with windows servers.