Forum Discussion

abhy201's avatar
abhy201
Icon for Nimbostratus rankNimbostratus
Sep 25, 2018

Pool member down due to no Ping response

Hello,

 

Below is the Monitor log for a pool member which shows down. The pool member is another Vserver which is active and gives a successful ping when I do it directly from cmd line.

 

The log shows that the ping is failing. Trying to understand why the monitor log shows ping failing when the direct ping from cmd line gives successful result.

 

[0][23691] 2018-09-24 17:20:49.032242: ID 2278 :(_do_ping): time to ping, now=[1537827649.032066], status=DOWN [ addr=::ffff:ip:port mon=/Common/abcd-https-monitor fd=-1 pend=0 conn=0 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1537827649.028548][2018-09-24 17:20:49] last_ping=[1537827644.062996][2018-09-24 17:20:44] deadline=[1537827650.078785][2018-09-24 17:20:50] on_service_list=True snd_cnt=6 rcv_cnt=0 ]

 

4 Replies

  • rluhrman_127985's avatar
    rluhrman_127985
    Historic F5 Account

    The logs is misleading. Ping really means request... if you do the same for an HTTP monitor it doesn't state "Request Sent" is states (_do_ping) etc.

     

    Can you do a "tmsh list ltm monitor (monitor name)" for output for the monitor that is failing?

     

    Also try using either telnet or socat to connect to the IP:PORT combination instead of using ICMP, which goes as far as layer 3, while your monitor should test all the way to layer 7.

     

    • abhy201's avatar
      abhy201
      Icon for Nimbostratus rankNimbostratus

      Thank you for checking. Below is the output for the list ltm monitor.

       

      ltm monitor https monitor_name-https-monitor { adaptive disabled cert /Common/client_cert-co.crt cipherlist DEFAULT:+SHA:+3DES:+kEDH compatibility enabled defaults-from https destination : interval 5 ip-dscp 0 key /Common/client_cert-co.key recv "HTTP/1.(0|1) (200|301|302|404)" recv-disable none send "HEAD / HTTP/1.0\r\n\r\n" time-until-up 0 timeout 16 }

       

      Also adding a detailed stack of the monitor log.

       

      [1][23692] 2018-09-24 17:20:47.467516: ID 2278 :(inst_to_service) Logging enabled. [ addr=::ffff:IP_abcd:Port srcaddr=none ] [0][23691] 2018-09-24 17:20:47.467517: ID 2278 :(inst_to_service) Logging enabled. [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.032073: ID 2278 :(_ssl_shutdown_service): shutting down, return ssl true [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port mon=/Common/monitor_name-https-monitor fd=14 ] [0][23691] 2018-09-24 17:20:49.032242: ID 2278 :(_do_ping): time to ping, now=[1537827649.032066][2018-09-24 17:20:49], status=DOWN [ addr=::ffff:IP_abcd:Port mon=/Common/monitor_name-https-monitor fd=-1 pend=0 conn=0 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1537827649.028548][2018-09-24 17:20:49] last_ping=[1537827644.062996][2018-09-24 17:20:44] deadline=[1537827650.078785][2018-09-24 17:20:50] on_service_list=True snd_cnt=6 rcv_cnt=0 ] [0][23691] 2018-09-24 17:20:49.032267: ID 2278 :(_send_active_service_ping): pinging [ addr=::ffff:IP_abcd:Port srcaddr=none ] [0][23691] 2018-09-24 17:20:49.032273: ID 2278 :(_connect_to_service): creating new socket (rd0) [ addr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.032298: ID 2278 :(_connect_to_service): connect: Operation now in progress [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.032326: ID 2278 :(_do_ping): post ping, status=DOWN [ addr=::ffff:IP_abcd:Port mon=/Common/monitor_name-https-monitor fd=15 pend=1 conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1537827654.028548][2018-09-24 17:20:54] last_ping=[1537827649.032066][2018-09-24 17:20:49] deadline=[1537827650.078785][2018-09-24 17:20:50] on_service_list=True snd_cnt=7 rcv_cnt=0 ] [0][23691] 2018-09-24 17:20:49.036759: ID 2278 :(_main_loop): Activity on pending service, now=[1537827649.036433][2018-09-24 17:20:49] [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port fd=15 pend=1 conn=1 ] [0][23691] 2018-09-24 17:20:49.036772: ID 2278 :(_send_active_service_ping): pinging [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.036787: ID 2278 :(_send_active_service_ping): writing [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] send=HEAD / HTTP/1.0\x0d\x0a\x0d\x0a

       

      [0][23691] 2018-09-24 17:20:49.036796: ID 2278 :(do_ssl_write): incoming state: 0 [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.036817: ID 2278 :(do_ssl_write) state: INIT [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.036825: ID 2278 :(initialize_ssl) legacy: false, cipher: 'DEFAULT:+SHA:+3DES:+kEDH', compat: true [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.037040: ID 2278 :(do_ssl_write) state: CONNECTING, legacy mode: false [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.037076: ID 2278 :(do_ssl_write): state: 4 [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040714: ID 2278 :(_main_loop): Service ready for read, now=[1537827649.040704][2018-09-24 17:20:49] [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port fd=15 pend=0 conn=0 ] [0][23691] 2018-09-24 17:20:49.040727: ID 2278 :(_recv_active_service_ping): reading [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040734: ID 2278 :(do_ssl_read) legacy mode: false [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040741: ID 2278 :(do_ssl_read): state: 4 [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040748: ID 2278 :(_send_active_service_ping): pinging [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040756: ID 2278 :(_send_active_service_ping): writing [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] send=HEAD / HTTP/1.0\x0d\x0a\x0d\x0a

       

    • abhy201's avatar
      abhy201
      Icon for Nimbostratus rankNimbostratus

      And also the telnet to the pool member is successful.

       

    • rluhrman_127985's avatar
      rluhrman_127985
      Historic F5 Account

      A tcpdump of the interactions between bigd (the daemon that does the health check communication) and the node would help as the communication can technically fail at multiple levels of the OSI model.

       

      I noticed that your receive string would accept a 404 as a valid response. Usually that would indicate that a resource being queried is not available, but the monitor would mark it up even if the resource was not found.

       

      For the tcpdumps, looking at both the traffic between Bigd and TMM, and TMM to the node may help to determine what is happening.

       

      For the traffic between tmm and bigd, use "tcpdump -ni :nnnh -s0 -w /var/tmp/$(hostname)_<vlan_name>.pcap host (ip address of node> and port

       

      For the traffic between tmm and the node use "tcpdump -ni 0.0:nnn -s0 -w /var/tmp/$(hostname)_tmm.pcap host (ip address of node> and port

       

      Run those concurrently.

       

      You should open a support ticket for the tcpdump analysis by a Network Support Engineer. Be sure the generate a qkview after running the tcpdumps and provide that as well.