Forum Discussion

Roman_80473's avatar
Roman_80473
Icon for Nimbostratus rankNimbostratus
Nov 17, 2011

Detect if node is half-dead with an iRule?

Hi folks,

 

 

I was tasked to monitor app servers in the pool with an iRule (LTM 10.2). I wrote a simple rule which does the following:

 

if I get into LB_FAILED, I take the node out and resend request

 

if I get into HTTP_RESPONSE, and http status >= 500, I take the node out and resend request

 

 

It only seems to be working when the nodes are either fine or completely dead. Otherwise, (server ran out of memory), request gets into LB_SELECTED, and sits there forever. I get "The connection to the server was reset while the page was loading" error in the browser after about a minute or two, but my iRule never kicks in.

 

 

Is there a way to detect that the node is "half-dead" with iRule? Or, there're some external configuration in the VIP, pool, etc?

 

 

Any help is greatly appreciated

 

Thanks, Roman

14 Replies

  • 1 using LB::detach + LB::reselect

    [root@ve1023:Active] config  b rule myrule list
    rule myrule {
       when RULE_INIT {
       set static::response_timeout  5
    }
    
    when HTTP_REQUEST {
       log local0. "Received request, beginning response monitor interval. [clock seconds]"
       set monitor_id [\
          after $static::response_timeout {
             LB::detach
             LB::reselect pool foo2
             log local0. "Timeout $static::response_timeout milliseconds elapsed without server response. [clock seconds]"
          }\
       ]
    }
    
    when HTTP_RESPONSE {
       log local0. "Received server response."
       if {[info exists monitor_id]} {
          log local0. "Canceling after script with id $monitor_id"
          after cancel $monitor_id
       }
    }
    }
    
     curl -i http://172.28.19.79
    ...no response...
    
    [root@ve1023:Active] config  tail -f /var/log/ltm
    Nov 18 09:52:49 local/tmm notice tmm[24220]: 013e0001:5: Tcpdump starting bcast on :::0 from 127.1.1.1:42905
    Nov 18 09:52:59 local/tmm info tmm[24220]: Rule myrule : Received request, beginning response monitor interval. 1321638779
    Nov 18 09:52:59 local/tmm info tmm[24220]: Rule myrule : Timeout 5 milliseconds elapsed without server response. 1321638779
    
    [root@ve1023:Active] config  tcpdump -nni 0.0 port 80 or port 88
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on 0.0, link-type EN10MB (Ethernet), capture size 108 bytes
    09:52:59.853876 IP 172.28.19.253.49633 > 172.28.19.79.80: S 1454416003:1454416003(0) win 5840 
    09:52:59.853911 IP 172.28.19.79.80 > 172.28.19.253.49633: S 1517764163:1517764163(0) ack 1454416004 win 4380 
    09:52:59.855952 IP 172.28.19.253.49633 > 172.28.19.79.80: . ack 1 win 46 
    09:52:59.855977 IP 172.28.19.253.49633 > 172.28.19.79.80: P 1:155(154) ack 1 win 46 
    09:52:59.856051 IP 200.200.200.10.49633 > 200.200.200.101.88: S 3113407707:3113407707(0) win 4380 
    09:52:59.857029 IP 200.200.200.101.88 > 200.200.200.10.49633: S 1833697333:1833697333(0) ack 3113407708 win 5792 
    09:52:59.857038 IP 200.200.200.10.49633 > 200.200.200.101.88: . ack 1 win 4380 
    09:52:59.857047 IP 200.200.200.10.49633 > 200.200.200.101.88: P 1:155(154) ack 1 win 4380 
    09:52:59.857983 IP 200.200.200.101.88 > 200.200.200.10.49633: . ack 155 win 54 
    09:52:59.860978 IP 200.200.200.10.49633 > 200.200.200.101.88: F 155:155(0) ack 1 win 4380 
    09:52:59.861895 IP 200.200.200.101.88 > 200.200.200.10.49633: F 1:1(0) ack 156 win 54 
    09:52:59.861909 IP 200.200.200.10.49633 > 200.200.200.101.88: . ack 2 win 4380 
    09:52:59.955864 IP 172.28.19.79.80 > 172.28.19.253.49633: . ack 155 win 4534 
    
    
  • 2 using HTTP::retry

    [root@ve1023:Active] config  b rule myrule list
    rule myrule {
       when RULE_INIT {
       set static::response_timeout  5
    }
    
    when HTTP_REQUEST {
       log local0. "Received request, beginning response monitor interval. [clock seconds]"
       set monitor_id [\
          after $static::response_timeout {
             HTTP::retry [HTTP::request]
             log local0. "Timeout $static::response_timeout milliseconds elapsed without server response. [clock seconds]"
          }\
       ]
    }
    
    when HTTP_RESPONSE {
       log local0. "Received server response."
       if {[info exists monitor_id]} {
          log local0. "Canceling after script with id $monitor_id"
          after cancel $monitor_id
       }
    }
    }
    
     curl -i http://172.28.19.79
    curl: (52) Empty reply from server
    
    [root@ve1023:Active] config  tail -f /var/log/ltm
    Nov 18 09:58:15 local/tmm info tmm[24220]: Rule myrule : Received request, beginning response monitor interval. 1321639095
    Nov 18 09:58:15 local/tmm err tmm[24220]: 01220001:3: TCL error: myrule  - Illegal argument. Can't execute in the current context. (line 1)     invoked from within "HTTP::request"
    
    [root@ve1023:Active] config  tcpdump -nni 0.0 port 80 or port 88
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on 0.0, link-type EN10MB (Ethernet), capture size 108 bytes
    09:58:15.745873 IP 172.28.19.253.57665 > 172.28.19.79.80: S 2312572844:2312572844(0) win 5840 
    09:58:15.745910 IP 172.28.19.79.80 > 172.28.19.253.57665: S 1192813388:1192813388(0) ack 2312572845 win 4380 
    09:58:15.748842 IP 172.28.19.253.57665 > 172.28.19.79.80: . ack 1 win 46 
    09:58:15.748861 IP 172.28.19.253.57665 > 172.28.19.79.80: P 1:155(154) ack 1 win 46 
    09:58:15.748956 IP 200.200.200.10.57665 > 200.200.200.101.88: S 498213601:498213601(0) win 4380 
    09:58:15.749863 IP 200.200.200.101.88 > 200.200.200.10.57665: S 2493742844:2493742844(0) ack 498213602 win 5792 
    09:58:15.749871 IP 200.200.200.10.57665 > 200.200.200.101.88: . ack 1 win 4380 
    09:58:15.749885 IP 200.200.200.10.57665 > 200.200.200.101.88: P 1:155(154) ack 1 win 4380 
    09:58:15.751050 IP 200.200.200.101.88 > 200.200.200.10.57665: . ack 155 win 54 
    09:58:15.754080 IP 200.200.200.10.57665 > 200.200.200.101.88: R 155:155(0) ack 1 win 4380
    09:58:15.754088 IP 172.28.19.79.80 > 172.28.19.253.57665: R 1:1(0) ack 155 win 4534
    
    
  • Else, I can try testing this later today. thanks Aaron!
  • Hi guys,

    I'm getting an error inside HTTP_REQUEST using HTTP::redirect. I set the timer in HTTP_REQUEST, and when the time expires, I redirect to another url, but get the error instead:
     - Illegal argument. Can't execute in the current context. (line 1)     invoked from within "HTTP::redirect $theUrl"
    

    Am I doing smth wrong?

    Thanks, Roman