Forum Discussion

bdo_isd_28658's avatar
bdo_isd_28658
Icon for Nimbostratus rankNimbostratus
Jul 27, 2009

LTM log query

Hi,

 

 

Can you let me know if when I see this in the LTM log ...

 

 

Jul 27 17:11:20 bip01 bigd[1917]: 01060001:5: Service detected DOWN for ::ffff:10.1.20.180:80 monitor test_mon_http1.1.

 

Jul 27 17:11:20 bip01 mcpd[2007]: 01070638:5: Pool member 10.1.20.180:80 monitor status down.

 

Jul 27 17:11:25 bip01 bigd[1917]: 01060001:5: Service detected UP for ::ffff:10.1.20.180:80 monitor test_mon_http1.1.

 

Jul 27 17:11:26 bip01 mcpd[2007]: 01070727:5: Pool member 10.1.20.180:80 monitor status up.

 

 

 

Does that mean that the monitor failed for 3 consecutive checks (16 seconds) and then the pool member was marked down. Or does it mean that the health check has failed once and then was successful on the following check?

 

 

Thanks,

4 Replies

  • The down message indicates the pool member (or node) was marked down. That log snippet shows the pool member being marked down after the timeout expired. The next request was answered successfully five seconds later, so the pool member was marked up again. If you want to see why you can try enabling debug on the monitoring daemon, bigd, by running 'b db bigd.debug enable'. Output is logged to /var/log/bigdlog. The logging is fairly verbose, so make sure to disable it when you're done.

     

     

    Another option would be to run tcpdump filtering for the LTM static self IP address and the pool member IP and port:

     

     

    tcpdump -i 0.0 -s0 -w/var/tmp/monitor.dmp host STATIC_SELF_IP and host POOL_MEMBER_IP and port POOL_MEMBER_PORT

     

     

    Aaron
  • Hi Aaron,

     

     

    Thanks for the reply.

     

     

    I've run the dumps and the debugs and it pretty much proves what I've been telling people here in that the server is not replying to the monitor for four checks and then being marked down.

     

     

    We have absolutely no idea why though.

     

     

    It is a domino web server and we have created a monitoring page monitoring.html that loads in about .4ms. but randomly throughout the day the monitor will fail and mark the node as down.

     

     

    Our health monitor is

     

     

     

    GET /monitoring.html HTTP/1.1\r\nHOST: myserver/monitoring.html \r\nCONNECTION: close\r\n

     

     

     

    Monitoring

     

     

    This works most of the time but when the server gets busy it seems to drop the monitoring checks from BigIP.

     

     

    I'm going to have to push out the timeout timer to 26 seconds to see if I can get some sort of health check working.

     

     

  • I'm not sure whether it matters or not, but typically the header name would be in title case, Connection and Host. The server may or may not parse it regardless of case. And the host header value should only contain the host--not the URI. So the send string should look like this:

     

     

    GET /monitoring.html HTTP/1.1\r\nHost: myserver\r\nConnection: close\r\n

     

     

    You might try running tcpdump on the LTM and server to compare the results when the monitor requests fail. Filtering by source IP and destination IP and port will help limit the tcpdump file sizes.

     

     

    Aaron
  • Hi Aaron,

     

     

    I copied the monitor text from a domino advice page but I should have checked it. I've changed it as you suggest and I'll leave the timeout at 16 for now.

     

     

    Lets see what happens tomorrow.