Forum Discussion

moog67_108621's avatar
moog67_108621
Icon for Nimbostratus rankNimbostratus
Jun 11, 2014

Question on TCP timeouts

Hi everyone,

 

Maybe a moot question but these are the kind of things that, as a support guy, I never can´t test myself in the lab. We have a simple vserver setup with two pool members balancing HTTP requests, default tcp profile applied,nothing special there.

 

What happens if both members are unreachable (ie, server-side link is broken, while client-side is still alive), is the Virtual IP still reachable?, does the F5 send any indication to the client about the unavailability of the pool members?, how long does it take for the client side connection to time out?

 

We have seen in the field that our client keeps sending traffic to the virtual IP for a long time (minutes) before the connection is eventually reset.

 

Thanks and regards, moog67

 

7 Replies

  • shaggy's avatar
    shaggy
    Icon for Nimbostratus rankNimbostratus

    Since the LTM is a full-proxy architecture, the VS/Virtual-IP (depending on the VS-type) are generally always reachable from a TCP/IP perspective regardless of pool member availability. A great resource is SOL8082: Overview of TCP connection setup for BIG-IP LTM virtual server types. This behavior makes sense if you think about F5 features - for example, you can select a pool based on a URI value, so before the F5 knows which pool to use, it must parse client data and needs the client-side to be established.

     

    If a pool member goes down while there are live connections to that pool member, LTM will act based on the pool's "action on service down" configuration SOL15095: Overview of the Action On Service Down feature. By default, this is "none", so the LTM will let the connection be handled by the protocol in play. If you want the LTM to send TCP RST to users when their pool member fails, this can be set to "reject".

     

    There are TCP idle timeouts spread across different configuration items, but if this is a standard VS, then those connections will probably timed out by the assigned TCP profile SOL7606: Overview of BIG-IP idle session time-outs. In your case, if "action on service down" is "none", it will let TCP time out on its own, which on many stateful network devices is 300 seconds. You can tweak the timeout value on the LTM by creating and assigning a custom client-side TCP profile to the virtual server.

     

    • moog67_108621's avatar
      moog67_108621
      Icon for Nimbostratus rankNimbostratus
      Thanks Shaggy for the comprehensive reply!!, lots of information there !!. However, our client runs a quite particular protocol (MS Smooth Streaming), which just keeps sending data on the client side, so the connection never goes to an idle state and idle timeout never comes in play. If "action on service down" is set to NONE, does the timeout depend on the underlying protocol or operating system (either at our client or F5's)? Regards moog67
    • shaggy's avatar
      shaggy
      Icon for Nimbostratus rankNimbostratus
      Yes, the timeout depends on the underlying protocol and how it is handled by the client/server application, OS, and possibly by other stateful inline devices (firewalls, proxies, etc.). TCP employs exponential back-off in retransmission scenarios. The F5 TCP profile has a "Maximum Segment Retransmissions" setting which is 8 by default, so it will forward 8 retransmissions (~128 sec total) before killing the connection. I think this would be tweaked on the server-side TCP profile since you're interested in retransmissions towards the pool member. Of course all of this network layer activity only takes place if the client application or OS hasn't already given up due to another configured control. Also keep in mind that the nature of the pool member failure affects how a client fails. If you are using an HTTP monitor that validates a healthcheck page, and the web service is awry causing the F5 monitor to fail, the server still may be listening/responding to TCP connections. None of the network layer retransmissions or timeouts come into play.
  • Since the LTM is a full-proxy architecture, the VS/Virtual-IP (depending on the VS-type) are generally always reachable from a TCP/IP perspective regardless of pool member availability. A great resource is SOL8082: Overview of TCP connection setup for BIG-IP LTM virtual server types. This behavior makes sense if you think about F5 features - for example, you can select a pool based on a URI value, so before the F5 knows which pool to use, it must parse client data and needs the client-side to be established.

     

    If a pool member goes down while there are live connections to that pool member, LTM will act based on the pool's "action on service down" configuration SOL15095: Overview of the Action On Service Down feature. By default, this is "none", so the LTM will let the connection be handled by the protocol in play. If you want the LTM to send TCP RST to users when their pool member fails, this can be set to "reject".

     

    There are TCP idle timeouts spread across different configuration items, but if this is a standard VS, then those connections will probably timed out by the assigned TCP profile SOL7606: Overview of BIG-IP idle session time-outs. In your case, if "action on service down" is "none", it will let TCP time out on its own, which on many stateful network devices is 300 seconds. You can tweak the timeout value on the LTM by creating and assigning a custom client-side TCP profile to the virtual server.

     

    • moog67_108621's avatar
      moog67_108621
      Icon for Nimbostratus rankNimbostratus
      Thanks Shaggy for the comprehensive reply!!, lots of information there !!. However, our client runs a quite particular protocol (MS Smooth Streaming), which just keeps sending data on the client side, so the connection never goes to an idle state and idle timeout never comes in play. If "action on service down" is set to NONE, does the timeout depend on the underlying protocol or operating system (either at our client or F5's)? Regards moog67
    • shaggy_121467's avatar
      shaggy_121467
      Icon for Cumulonimbus rankCumulonimbus
      Yes, the timeout depends on the underlying protocol and how it is handled by the client/server application, OS, and possibly by other stateful inline devices (firewalls, proxies, etc.). TCP employs exponential back-off in retransmission scenarios. The F5 TCP profile has a "Maximum Segment Retransmissions" setting which is 8 by default, so it will forward 8 retransmissions (~128 sec total) before killing the connection. I think this would be tweaked on the server-side TCP profile since you're interested in retransmissions towards the pool member. Of course all of this network layer activity only takes place if the client application or OS hasn't already given up due to another configured control. Also keep in mind that the nature of the pool member failure affects how a client fails. If you are using an HTTP monitor that validates a healthcheck page, and the web service is awry causing the F5 monitor to fail, the server still may be listening/responding to TCP connections. None of the network layer retransmissions or timeouts come into play.