Forum Discussion

MW1's avatar
MW1
Icon for Cirrus rankCirrus
Jul 01, 2009

HTTP monitor fails under heavy loads but connections still work for client connections

Hi all,

 

just wondering if anyone has any pointers on where to trouble shoot an issue I've run in to on a LTM. I've got a basic HTTP monitor that hits a specific URL and matches a string. The setup is pretty standard -load balanced IP listening on HTTPS, which terminates the SSL and talks HTTP to a pool of 3 Tomcat web servers behind.

 

All works fine normally however we've been doing some load testing and the HTTP monitor ends up failing under heavy load to the point there are no members in the pool. If I relpace the HTTP monitor with a simple ICMP one all works fine and the load testing runs well (ie the client pc's are able to keep hitting the webservers via the load balanced IP without issue). I have another F5 LTM which I configured to monitor the webservers while the load test was running which never had any issues with the HTTP monitor.

 

I don't see any errors on the interfaces on the F5 or other pointers which might show an issue. I have tried using curl from the cli during the load tests which I have found occassionally appeared to hang while connecting.

 

thanks in advance for any advise offered

 

Matt

5 Replies

  • Hi Matt,

     

     

    This might get laborious to troubleshoot if the issue only happens under load. Normally, I'd suggest enabling debug on bigd, the monitoring daemon, ('b db bigd.debug enable' output to /var/log/bidglog) and possibly capturing tcpdumps on LTM and the pool members. However, enabling bigd debug would potentially affect the testing. It would probably be fastest for you to open a case with F5 Support to get help in troubleshooting the issue.

     

     

    Aaron
  • sorry for the delay in getting back to you - I appear to have just (40 mins ago) resolved the issue, now trying to figure out why!

     

    I was working with F5 support who advised:

     

     

     

    Another thing that would be helpful to see is if the LTM is actually getting the monitor traffic back to itself. What would be helpful is to capture a tcpdump showing when the monitors are failing during your load test.

     

    tcpdump -s0 -ni internal host and host -w /var/tmp/C5XXXX-monitorcapture.dmp

     

    Please use the selfIP of the LTM and not the floating IP since monitors are sent from the SelfIP of the LTM and not the floater.

     

     

    As this is a test device is was setup as a single device not a pair, so to be able to just capture the monitor traffic per the above I set it up as if it was in a pair, and added a floating IP on the inside. Since then I've run two load tests and the F5 has not see the monitor fail even once, even when I dropped it back from a 15sec interval/46 sec timeout (which had failed), back to a standard 5sec internal/16 sec timeout.

     

     

    The only thing in my head that could cause this would be exhausting the IP stack when on a single IP, however I don't believe this to be the case.

     

     

    If anyone has any ideas I'm all ears....just glad it appears to be fixed!
  • spark_86682's avatar
    spark_86682
    Historic F5 Account

    The only thing in my head that could cause this would be exhausting the IP stack when on a single IP, however I don't believe this to be the case.

     

     

     

    This is what popped into my head too. Why don't you think it's the case? Port exhaustion would cause exactly those symptoms if the same IP is used for monitoring as it is for client connections. If you're not using SNAT at all, though, then it's still a mystery...
  • It could well be port exhaustion, and after sleeping on it, it does fit however I'm just surprised as I didn't think the load testing was producing enough hits to have done this. I presume the only way to prove this would be running a netstat -an from the CLI during the test (unless anyone knows a better way to monitor this?

     

     

    thanks