Forum Discussion

Rob_Wotton_8024's avatar
Rob_Wotton_8024
Icon for Nimbostratus rankNimbostratus
Jun 24, 2015

Node health check failing due to slow ping responses

Hi,

 

Over the past couple of weeks we have encountered a very odd issue, we have two F5 LTM (version 10.2.4) running in a HA Active / Standby configuration. Both of these are connected via a Cisco switch to our VMWare environment where our web servers are located.

 

From one of the LTM's, the node health check (ping) marks all nodes as up and responding, where as the other LTM is marking the same ones down due to very slow ping responses (~15 seconds). We can replicate this by logging on via a SSH session and trying to ping the same virtual web server from both of the LTM's. One other key bit of information is that the slow ping response times affect servers connected to either interface 1.1 or 1.9.

 

Once a server has been marked down, if we change the IP from x.x.x.78 to x.x.x.79 the server is instantly marked as up, however if we reverse that and change it back to x.x.x.78 it is marked down again.

 

Up until a couple of weeks ago everything was working fine. We have spoken to our managed data centre provider and they can't find any issues on the switches, and our F5 reseller is unable to identify any issue on the LTM's.

 

Obviously different physical ports are being used on the switches, but apart from that the configuration is identical from both LTM's to the web servers.

 

If anyone is able to offer any advice or suggestions, I will be very grateful.

 

Many thanks, Rob

 

2 Replies

  • route table the same? Is the ping traffic on your standby taking a tmm path or a mgmt path? Take a tcpdump on mgmt and tmm interfaces and see if you can isolate the egress/ingress paths for the pings.

     

  • This could be an inconsistent configuration or a network issue. Can you replicate the issue by testing from console (SSH) ?