Forum Discussion

Krzysztof_Kozlo's avatar
Krzysztof_Kozlo
Icon for Nimbostratus rankNimbostratus
Aug 03, 2007

lb::reselect fails to select another node

I stripped out everything fancy and this still doesn't work. The behavior is peculiar:

 

 

rule reselect_test {

 

when LB_FAILED {

 

LB::reselect

 

}

 

}

 

 

pool test {

 

member 1.1.1.1:any

 

member 1.1.1.2:any

 

}

 

 

virtual test {

 

destination 1.1.2.1:any

 

protocol tcp

 

rule reselect_test

 

pool test

 

snat automap

 

}

 

 

When I connect, every other time the connection hangs while the LTM goes nuts trying to reconnect to the same back-end server.

 

 

Curiously, if I open another connection it breaks the first connection out of this loop and connects to the second.

 

 

I tested this on two 9.2.3 255.0 and one 9.3.0 system. Same thing.

 

 

I thought LB::reselect was a) supposed to select a _different_ node and b) supposed to be limited in the amount of retries?

8 Replies

  • Try to call to LB::detach before LB:reselect.

     

    LB::detach disconnects the server side connection.
  • Joseph_Chan_463's avatar
    Joseph_Chan_463
    Historic F5 Account
    BTW, is there a monitor to check the health of those two nodes?

     

     

    This topic also tries to do something similar.

     

    http://devcentral.f5.com/Default.aspx?tabid=53&forumid=5&postid=14059&view=topic

     

     

    You may wish to try LB::down, but monitor is the proper way to do this. Monitor will watch out for the node when it comes back up. Rule marks it down and forget about it.

     

     

    http://devcentral.f5.com/wiki/default.aspx/iRules/LB__down.html

     

     

  • Deb_Allen_18's avatar
    Deb_Allen_18
    Historic F5 Account
    LB::reselect chooses a node based on the LB algorithm for the pool, which may or may not be a "different" server. It reselects only once, but if the server fails to respond, you will loop on the LB_FAILED event endlessly unless you include some count/stop logic in your iRule.

     

     

    When you say "every other time the connection hangs", that would seem to indicate that one of your pool members is not responding. I don't see that you have any monitoring in place.

     

     

    I'm not sure why the other node isn't selected on failure, though, since you have default LB method Round Robin configured.

     

     

    I'd start by applying a monitor to the pool. You should see better behaviour then. If you continue to have difficulty, post back & we can try to help further.

     

     

    /deb

     

  • I don't want monitoring on the pool. The whole idea is that this is supposed to be a layer-3 rule that will dynamically send users to only servers that are listening on a given port.

     

     

    One of the servers is not responding, that's correct. That is by design. The connection hangs because the LTM is infinitely looping reselecting the same node that it selected to begin with (i.e. the one that doesn't respond).

     

     

    What I want it to do is select the other one when the first one fails. That's what lb::reselect is supposed to do, but it doesn't.
  • Deb_Allen_18's avatar
    Deb_Allen_18
    Historic F5 Account
    I'd say you need to open a Support case, then, especially if you've been struggling with this for several months without resolution.

    An iRules workaround might be to manually re-select the other server, then bail out if both are non-responsive. I've had other customers implement similar logic successfully for other reasons, but it obviously won't scale well above 2 servers:
    when CLIENT_ACCEPTED {
      set failed 0
    }
    when LB_FAILED {
      incr failed
      if {$failed > 1 }
        specify action if both servers failed
        reject
      } else {
         default case would match if no pool or server selected
        switch [LB::server addr] {
          1.1.1.1 { node 1.1.1.2 0 }
          1.1.1.2 { node 1.1.1.1 0 }
          default { reject }
        }
      }
    }
    The advantage of monitoring is that a monitor looks for an expected response, rather than just a SYN/ACK, to determine if the server is healthy enough to receive traffic.

    HTH, and please let us know what you discover with Support.

    /deb

  • I'm well aware of the advantage of out-of-band monitoring. In this case, that doesn't scale, since the servers in question are dynamically allocated on various ports and we don't want to have to update the load balancer configuration every time a server is brought up or down.

     

     

    Also, this scheme will catch servers that fail to respond within the interval window of out-of-band monitoring.

     

     

    I've had a support case opened since last Friday, but have only yesterday received a response, which was a suggestion to check this thread!