Forum Discussion

Sly_85819's avatar
Sly_85819
Icon for Nimbostratus rankNimbostratus
Dec 28, 2009

inet port exhaustion - urgent help needed

We recently had two outages which involved single system sending lot of DNS queries to LTM causing it to slow down and ultimately resulting in performance degradation for all the apps configured on LTM. F5 support suggested that the ephemeral ports were full and we should configure additional self ip to mitigate the situation. A single host on the network causing LTM to slow down is serious cause of concern. I would like to know if there are any ways to proactively take care of this situation. We have configured SNMP traps which helped in getting notification and reduce the outage time when it happened second time.

 

Here is the messages that we received - 01010201:2: Inet port exhaustion on 10.1.10.61 to 172.24.8.103:53 (proto 17)

 

10.10.1.61 is the host sending DNS requests. 172.24.8.103 is the pool member of DNS VS. DNS VS is 172.24.4.252. The name server VS is "standard" VS which I believe I need to configure it as "Perf L4" to forward traffic directly instead of doing full proxy. The message is however confusing as the client is hitting server directly??? We have one more VS which allows direct access to the servers behind LTM using a VS - Forwarding (IP). I believe forwarding IP forward traffic directly using route table. I am wondering how the ephemeral ports gets utilized? Is the message actually for the VS?

 

 

Thanks in advance.

7 Replies

  • It sounds like you're using SNAT auto map on this virtual server. If you are, that's almost positively your problem. I've run into this exact scenario before, with aggressive DNS traffic causing ephemeral port exhaustion. Fortunately, the fix is relatively easy: use a snat pool with multiple addresses in it. This will do a few things:

     

     

    1) You'll get a ton ephemeral more ports for the virtual server.

     

    2) If you need more, simply add another SNAT address to the pool.

     

     

    While this is a painful problem when you run into it, a SNAT pool should immediately fix your issue. Have a look at https://support.f5.com/kb/en-us/solutions/public/2000/500/sol2561.html for a bit more info on this.

     

     

    -Matt
  • We are not using SNAT for the concerned VIP's. The logged message shows connection directly to the pool member. I am still trying to understand whether it was sending traffic to the VIP or the pool member. Below is the config. The iRule basically allows the servers behind LTM to talk to other VIP's on the same LTM. The inbound_11_route VS allows connection directly to the pool members.

     

     

    virtual ns-phx.bmc.com {

     

    pool ns-phx.bmc.com

     

    destination 172.24.4.252:domain

     

    ip protocol udp

     

    vlans PRDSRV200

     

    PRDVIP100 enable

     

    rules ns-phx-snat-iRule1

     

    persist source_addr

     

    }

     

     

    virtual Inbound_11_Route {

     

    ip forward

     

    destination 172.24.8.0:any

     

    mask 255.255.252.0

     

    vlans PRDVIP100 enable

     

    profiles fastl4_90mins_timeout

     

    }

     

     

    rule ns-phx-snat-iRule1 {

     

    when CLIENT_ACCEPTED {

     

    if { [matchclass [IP::client_addr] equals $::all_server_nodes]} {

     

    snat automap

     

    }

     

    }

     

    }
  • It looks like you actually may be using SNAT automap, according to your confirg. The following virtual server points to the iRule above that issues a SNAT automap address based on a class match on all_server_nodes:

     

     

    virtual ns-phx.bmc.com {

     

    pool ns-phx.bmc.com

     

    destination 172.24.4.252:domain

     

    ip protocol udp

     

    vlans PRDSRV200

     

    PRDVIP100 enable

     

    rules ns-phx-snat-iRule1

     

    persist source_addr

     

    }

     

     

    If you create a snat pool and point this rule to that pool, it may help.

     

    -Matt
  • You may have already checked your timeouts, but if not you may want to consider the connection timeout in the profile assigned to that virtual. The default timeout for TCP is 300 seconds, for UDP it's 60 seconds, both of which are an eternity for DNS. In the past I've used timeouts of 5-10 seconds for DNS traffic.

     

     

    You touched on switching the virtual to perfL4 rather than Standard. If you do not need the advanced functionality that a "Standard" virtual offers, I would definitely make the switch. Should you go this route keep in mind that the change to the timeout will need to be made in a new fastL4 profile rather than the TCP profile. (I recommend against changing the default profiles; creating a custom profile for every vip that needs some customization is the way to go.) Another benefit of using the fastL4 profile is the "Loose Initiation" option, which allows "new" connections to be created even if the received packet is not a SYN.

     

     

    I cannot recall if the port exhaustion message would be explicit about whether the connections causing the exhaustion. Assuming it would, and that this traffic is using the ip forwarding virtual, I would check the timeouts in the profile being used by that virtual. (from your config snippet it looks like the default fastl4 profile.) By default the fastL4 profile uses a 300 second timeout, which I believe is applied to UDP and TCP.

     

     

     

    --jesse
  • Matt,

     

     

    We are using SNAT however it using Data Class wherein it only Src NAT the traffic originating from the servers behind LTM.

     

     

    Jasse,

     

     

    I am changing the VS to Perf L4 and changing the timeout settings. Can you explain the timeout setting in fastL4 profile vs UDP. How the settings gets applied?

     

    We checked the offending machine and found out the DNS VS was configured as DNS server and hence i will be targeting the VS settings. Do you think that Perf. L4 will not cause port exhaustion since it will forward the packets instead of doing full proxy?
  •  

    The functionality of the timeout in fastL4 is the same as in the UDP profile, the fastL4 profile is just much, much more efficient that the UDP profile because it assumes that nearly no advanced operations will be required on the traffic through the virtual. For example, you can't assign an iRule that inspects the packet data to a fastL4 profile, deep inspection requires the more advanced features offered by the "standard" profile. The "standard" profile generates a lot more overhead because it's capable of doing so much more than the fastL4 profile.

     

     

    Regarding port exhaustion, the ports are still being used and you will still need to ensure to set the timeout low enough to avoid all of them becoming utilized at the same time. I would probably use a timeout of 10 seconds, and enable "Loose Initiation" so if a packet is received for which a connection is not in the connection table (i.e. a TCP connection got closed before the client was actually done with it), a new connection will be created based on any packet received, not just a SYN.

     

     

    Note that Loose Initiation is a potential security concern since any packet to that virtual server will now create a connection, not just SYNs. However if this is a more-or-less trusted environment then this solution will make ports available much, much faster than the default timeouts and still be forgiving of clients that simply go idle for more than 10 seconds. If this is a UDP-only DNS server you wouldn't need to change the "loose initiaition" setting at all because any UDP packet will generate a new connection table entry.

     

     

     

    --jesse
  • Got it. Is there order by which the profile with similar settings gets executed? I read something about timeout on protocol profile and source address persistence.

     

     

    F5 support suggested us to create additional self IP to mitigate port exhaustion problem. I had configured SNMP traps to send email notification for port exhaustion which helped me during the second time similar condition occurred (~800 + emails in 30 mins.). I will work on tweaking the VS settings.