Forum Discussion

Nfordhk_66801's avatar
Nfordhk_66801
Icon for Nimbostratus rankNimbostratus
Dec 11, 2014

Receiving Log of Servers down for 9-10 Seconds

Hi,

 

We are regularly receiving logs that our pool members are unavailable temporarily. About 16x per week at random intervals. The back end we use to monitor with ICMP never goes offline but, the pool member using http to monitor does.

 

It's always the same amount of downtime 9-10 seconds consistently. It really doesn't make sense. We're on 11.5.1 with HF4 BIGIP 5000s.

 

I created another test server in the same virtual environment thinking it could be related to the application but the issue was reproduced. It appears to be either a virtual environment or F5 issue.

 

Any ideas on how to troubleshoot?

 

6 Replies

  • If you have an HA pair, are both BIGIPs showing the monitor failures simultaneously? If so, that helps eliminate a BIGIP system issue. I would start by using tcpdump to capture the failed monitors and see exactly why they're failing (bad response, no response, etc.). Another option is to have another non-F5 device in the same network as the F5s mimicking the health monitor using curl to see if it shows failures/timeouts at the same time as the F5

     

    • Nfordhk_66801's avatar
      Nfordhk_66801
      Icon for Nimbostratus rankNimbostratus
      It's never the pair at the same time. but this issue affects both in the pairs at random various times
  • shaggy's avatar
    shaggy
    Icon for Nimbostratus rankNimbostratus

    If you have an HA pair, are both BIGIPs showing the monitor failures simultaneously? If so, that helps eliminate a BIGIP system issue. I would start by using tcpdump to capture the failed monitors and see exactly why they're failing (bad response, no response, etc.). Another option is to have another non-F5 device in the same network as the F5s mimicking the health monitor using curl to see if it shows failures/timeouts at the same time as the F5

     

    • Nfordhk_66801's avatar
      Nfordhk_66801
      Icon for Nimbostratus rankNimbostratus
      It's never the pair at the same time. but this issue affects both in the pairs at random various times
  • nathe's avatar
    nathe
    Icon for Cirrocumulus rankCirrocumulus
    My advice would be to review the health monitor setup, in particular the Interval and Timeout. Does this need to be increased? Also, I'd run a packet capture to see if there is a delay in the response from the server perhaps. something like tcpdump -ni 0.0 host server_ip and port 80.
  • Found the issue to be a bug with the help of F5 Impact Pool members monitored by the affected health monitor are erroneously marked down. Symptoms As a result of this issue, you may encounter the following symptoms: Pool members are marked down when they are actually up. A packet capture on the affected monitor traffic shows that the BIG-IP system receives a SYN/ACK from a pool member and responds with an ICMP destination unreachable message. Here is the link to a solution article that details the issue. SOL15907: The BIG-IP system may incorrectly send an ICMP destination unreachable message to a server responding to health monitor traffic on TCP source port 54321 https://support.f5.com/kb/en-us/solutions/public/15000/900/sol15907.html SOL13123: Managing BIG-IP product hotfixes (11.x) https://support.f5.com/kb/en-us/solutions/public/13000/100/sol13123/