Forum Discussion

Gea-Suan_Lin_34's avatar
Gea-Suan_Lin_34
Icon for Nimbostratus rankNimbostratus
Jan 05, 2010

Packet lost when loading

Hello,

 

 

We have two F5 BIG-IP LTM 6400 with 9.4.7 HF2, running with Active/Standby mode. I setup snmp trap, and copy these log to IRC, to let our management team understand what's happen easily.

 

 

There are lots of monitor DOWN & UP msgs about one year. But because it's quite quick to UP again, and usually we have 3+ servers in one pool, this is not issue.

 

 

Recently, because the site grow, DOWN/UP is quote annoyed, I want to find out the problem and fix it. So I've tried serveral way to diagnose.

 

 

I tried to run tcpdump on both web server and F5 itself, and I found there is packet lost, which causes monitor DOWN:

 

 

https://gist.github.com/6f573f746c2eed533e65

 

 

As you can see, after F5 send first SYN packet, webapi-1's first reply (SYN+ACK) didn't be received by F5. And then both side tried to resend packet, which causes issue.

 

 

I also tried to ping (with interval 0.01 sec) and get:

 

 

10000 packets transmitted, 9995 packets received, 0.1% packet loss

 

round-trip min/avg/max/stddev = 0.082/0.235/8.674/0.182 ms

 

 

At the same time, I also ping the standby one, which has no packet lost:

 

 

10000 packets transmitted, 10000 packets received, 0.0% packet loss

 

round-trip min/avg/max/stddev = 0.079/0.155/0.666/0.040 ms

 

 

Any possible cause ? I've seen some discussion in http://devcentral.f5.com/Default.aspx?tabid=53&view=topic&postid=34302 but this seems not same issue.

4 Replies

  • Hi Gea-Suan,

     

    I have had this problem ongoing on and off. I haven't been able to resolve every one one but about 95% of it was related to how the webserver responds to health probes. Most cases I deal with it was adjusting the probing from say 5 seconds to 10 seconds with a larger timeout and that appeared to either resolve it or usually cut down the false alarms. In other cases the application needed to be tuned. Ultimately your best bet is to work with F5 support so they can look into your configuration and determine the best course of action.

     

     

     

    Bhattman
  • I've upgraded to 9.4.8 HF2 yesterday, and the the problem still exists.

     

     

    If it's only affected monitor, then I can accept to set timeout from 5 secs (current setting) to 10 secs. But I've seen 2.7% packet lost in peak time, which causes the traffic between F5 and backend server slow.

     

     

    Anyway, I'll contact support team to investigate the issue and see what's happen. Thanks for reply.
  • So the SYN-ACK sent from the web server to the LTM was never received - definitely indicates packet loss. This sounds very much like a duplex mismatch somewhere between the two endpoints . I would start comparing the speed and duplex settings everywhere along the device chain. In particular if you have 100Mbit links somewhere, ensure one side isn't set to 100/Full while the other is set to Auto/Auto. This results in a half-duplex link which causes collisions and packet loss - I've seen it many times.
  • It's definitely possible. Also if you can check on the version of the switch you are using to make sure. I know we had a couple switches that were incorrectly going half-duplex even when all the wiring and settings were set to Auto and it turned out to be some bug.

     

     

     

    Bhattman