Forum Discussion

Wil_Schultz_101's avatar
Wil_Schultz_101
Icon for Nimbostratus rankNimbostratus
Aug 09, 2007

LB_FAILED behavior, expected or not?

I have the following iRule, use if for when my nodes are down connections go to a different page.


when LB_FAILED { 
 switch [LB::server pool] { 
  default { 
   set remoteip [IP::remote_addr]
   set uri  [HTTP::uri]
   set hostname  [HTTP::host]
   log local0. "$remoteip is looking up Hostname $hostname and URI $uri"
   HTTP::redirect http://maint.my.com
  } 
 } 
}

I found something today that gives me different behavior than I would have expected. I have 3 servers in my pool and when one of them fails for whatever reason this above rule will actually send 1/3 of my traffic to this maintenance page until the BigIP marks the server as down. I've got my check set up in 5 second intervals, and fail at 16 seconds. So all the traffic that is sent to the one down server that has yet to be marked down will hit the redirect.

Is this expected behavior? Sounds to me that LB_FAILED should actually be LB_SERVER_FAILED

3 Replies

  • Hi,

     

     

    That sound right...

     

     

    Click here

     

     

     

    Triggered when the system fails to select a pool member or when a selected pool member fails to respond to a connection request.

     

     

     

     

    If you wanted, you could reselect a new node in the pool instead of redirecting, using LB::reselect. There are a few posts on this in the forums. I think the max retries is hardcoded to two though. I haven't tested it, but I would assume you could achieve something similar with setting the pool's 'Action on Service Down' to reselect (Click here).

     

     

    If you wanted to mark the non-responding node down from the LB_FAILED event, you could using LB::status; but then in effect you're setting your monitor timeout to "one request".

     

     

    Aaron

     

  • Yes this is expected behaviour - the BigIP will keep sending traffic to the down server until it is marked down which would be after your health checks fail for the configured number of retries.

     

     

    I was just wondering though why you have the switch in your code when all you test for is the default condition...
  • FYI, If you tune your tcp profile to limit the syn retransmissions to 2, you can get an LB_FAILED event around 9 seconds, which would occur before your monitor timeout of 16 seconds. Please see this thread for a more detailed discourse on this from deb:

     

     

    http://devcentral.f5.com/Default.aspx?tabid=53&forumid=5&tpage=1&view=topic&postid=1523815269 Click here