Forum Discussion

Krzysztof_Kozlo's avatar
Krzysztof_Kozlo
Icon for Nimbostratus rankNimbostratus
May 02, 2007

TCP redirect on LB_FAILED for in-band health check.

We have several situations in the enterprise where it is desirable to have a large number of farmed services run on a single pool of servers. New instances come online all the time, and only TCP health checks are required, but we don't want to configure an explicit pool, complete with monitor, each time someone starts up a listening process on a port.

 

 

We want to use a Layer 3 virtual server like this:

 

 

virtual moo {

 

destination 1.1.1.1:any

 

ip protocol tcp

 

pool moo

 

rule moo

 

}

 

 

pool moo {

 

member server1:any

 

}

 

 

pool foo {

 

member server2:any

 

}

 

 

What I'd like to be able to do is create a rule like this:

 

 

rule moo {

 

when LB_FAILED {

 

log "connection to [IP::server_addr] failed"

 

use pool foo

 

}

 

 

This would enable an on-the-fly TCP health check, essentially -- if the host is not responding on that port, try the other server. I don't see any reason this shouldn't be possible, but it doesn't work. I simply get disconnected when LB_FAILED. LB_FAILED is working, based on LTM output:

 

 

May 2 16:20:05 tmm tmm[1049]: 01220002:6: Rule moo : connection failed: 144.203.239.34

 

 

Also, it is not the case that LB_FAILED is processed after the client flow is closed. This rule works:

 

 

rule moo {

 

when LB_FAILED {

 

log "connection failed: [IP::server_addr]"

 

TCP::respond "sorry, dude, your server's down."

 

 

}

 

}

 

 

Observe:

 

 

zuul /u/ineteng/Data/f5 239$ telnet 10.165.29.17 23

 

Trying 10.165.29.17...

 

Connected to 10.165.29.17.

 

Escape character is '^]'.

 

sorry, dude, your server's down.Connection closed by foreign host.

 

zuul /u/ineteng/Data/f5 240$

 

 

Anyone have any ideas? This sure would be useful!

 

 

 

 

 

 

 

10 Replies

  • If no one has any experience or tips to offer on getting this working, can I ask if anyone at least sees this functionality as useful? Folks I've talked to here are pretty excited about the possibilities.

     

     

    What we want to do in effect is set up a Layer 3 rule with no monitoring, but make sure that any connections on any port are directed to a server that's listening on that port. If nothing is listening, the connection would be dropped.

     

     

    Combined with, say, source IP persistence, this would allow us to load balance services that talk on arbitrary port ranges, or our present use case, in which we want to be able to start up servers arbitrarily on the pool members and have them load balanced (or at least highly available) without having to touch the LTM.

     

     

    If we can't do this today, it sounds like a ripe, low-hanging feature request for the dev team at the least! I don't know of any other vendor who can claim in-band TCP health checking...
  • Actually, Cisco LocalDirector (yes, that dinosaur) did this passive monitoring. It removed members from the pool after X number of failed tcp handshake attempts, then occasionally would throw bones back at it in attempts to bring it back "online"

     

     

    I was hoping that the passive monitoring hyped for 9.4 was in line with this, but it is not the same.
  • This is great! The documentation for 9.2.3 does not list "LB::reselect" as a method. (F5, send your doc writers back to the salt mines.) Initial results seem positive. I'll doc my full iRule when and if I get it working.
  • According to the iRules Wiki (which I just discovered, thank you very much):

     

     

    This command is used to advance to the next available node in a pool, either using the load balancing settings of that pool, or by specifying a member explicitly. ****Note that the reselection is currently limited to two tries.**** (emphasis added)

     

     

    If this is correct, it means that a loop is not possible, and the logic

     

     

    when LB_FAILED {

     

    if { [LB::server addr] == "" } {

     

    log "connection failed: no servers available"

     

    } else {

     

    log "connection failed: [LB::server addr]"

     

    LB::reselect

     

    }

     

    }

     

     

    is all we need. It also means that this technique is limited to pools with three or fewer members (two retries) unless that documentation is obsolete.

     

     

  • bl0ndie_127134's avatar
    bl0ndie_127134
    Historic F5 Account
    Ok, I would like to kill this urban legend that Passive monitoring is limited to HTTP right now. 'LB::status' can be used from most reasonable events such as LB_FAILED HTTP_RESPONSE etc.
  •  

    I've found that LB::status is great if you want to know ltm's current understanding of a member's status. However, I don't think the status is instant. I remember having to handle a situation where LB::status would report "up" even though a node had just failed. If you handle the LB_FAILED event, you can know instantly that a member has just failed. I think that LB::status would report "up" until a health check or irule marked the member as down.

     

     

    So basically, LB::status helps let you know ltm's current (which can be delayed by health check interval) knowledge of a member. I found that handling LB_FAILED is more "instant".
  • bl0ndie_127134's avatar
    bl0ndie_127134
    Historic F5 Account
    You are right, the status value is determined by the monitors so there is a bit of a lag depending on how you have the monitor setup.

     

     

    However you can set the status (down the member) from the rule (because you got a SOAP Exception etc.) and effect of this is immediate.

     

     

    This server will be marked down and will only be marked back up next time the monitors have a successful health check (or for some reason you want to mark them up in rules; which is actually possible but not recommended). nded).
  • BTW it would be useful to have the number of retries for LB::reselect be configurable.
  • as well as removing from consideration any failed pool member previously selected during the current iteration of the reselection process.
  • The rule above seemed to work when I tested it back in May, but now that I am trying it again it seems to get into an infinite loop of SYN/RESETs with the downed back-end every other time with lb::reselect reselecting the same (broken) server.

     

     

    Has anyone seen this? What could cause it? I'm running v9.2.3 255.0...