Forum Discussion

DB's avatar
DB
Icon for Nimbostratus rankNimbostratus
Mar 02, 2010

What happens when LTM marks a node "down"

When my LTM (9.3.1) sees a node in a server pool go down because one of it's two health monitors fails, I would expect the server to no longer have traffic hitting it. However my server guys tell me that this occurance (see log entries below, which happen all the time) are nothing to worry about because during the "down" duration of 5-16 seconds, the server is still getting requests from clients through the LTM. Will LTM continue to send traffic to a node marked "down"?

 

 

Mar 1 12:39:11 mvhatp01 bigd[1196]: 01060001:4: Service detected DOWN for ::ffff:1.1.1.1:80 monitor DB_Down_sit-www.foo.com.

 

Mar 1 12:39:11 mvhatp01 mcpd[1200]: 01070638:3: Pool member 1.1.1.1:80 monitor status down.

 

Mar 1 12:39:19 mvhatp01 bigd[1196]: 01060001:4: Service detected UP for ::ffff:1.1.1.1:80 monitor DB_Down_sit-www.foo.com.

 

Mar 1 12:39:19 mvhatp01 mcpd[1200]: 01070638:3: Pool member 1.1.1.1:80 monitor status up.

 

 

My pool is setup like this (note that the third member is just a static Maintenance Page):

 

pool pool_sit-www.foo.com {

 

lb method member predictive

 

min active members 1

 

monitor all sit-www.foo.com-monitor and DB_Down_sit-www.foo.com

 

member 1.1.1.1:http priority 2

 

member 1.1.1.2:http priority 2

 

member 1.1.1.107.107:http monitor none

 

 

10 Replies

  • LTM should not send new connections to a pool member that's marked down. What LTM does with existing connections after a monitor has marked a pool member down depends on the pool's "action on service down" setting:

     

     

    From the online help for "action on service down":

     

     

     

    * None: Specifies that the system does not select a different node. Selecting None causes the system to send traffic to the node even if it is down, until the next health check is done.

     

    * Reject: Specifies that the system sends an RST or ICMP message.

     

    * Drop: Specifies that the system simply cleans up the connection. (removes the connection from the connection table)

     

    * Reselect: Specifies that the system selects a different node. Selecting Reselect causes the system to send traffic to a different node after receiving the message that the original node is down.

     

     

     

     

    Aaron
  • DB's avatar
    DB
    Icon for Nimbostratus rankNimbostratus
    It makes perfect sense now (we're set to "none"). Thanks Aarron.
  • I have some really good info on the "action on Service Down' feature in the 9x train of code...

     

     

    I spent quite a bit of time performing packet captures to see what's really going on.

     

     

    Conclusion? There's really only one option to use, "reject". Seems reselect doesn't actually resellect current connections to other pool members..

     

     

    Support said they were going to change their documentation in the manual..

     

     

  • DB's avatar
    DB
    Icon for Nimbostratus rankNimbostratus
    Thanks for the info. Perhaps a refining comment is "until the next health check is done" and then the behavior changes, as in it behaves differently during a single (short term, temporary) outage than compared to a longer term (multiple health check timeframes pass) outage?
  • If I remember rightly, the 'reselect' option only makes sense for state-aware objects, like firewalls, or a pair of sync'd LTMs. I don't believe it will work for a standard server pool member.
  • If I remember rightly, the 'reselect' option only makes sense for state-aware objects, like firewalls, or a pair of sync'd LTMs. I don't believe it will work for a standard server pool member.

     

     

    Where are you getting that info from?

     

     

    The standard defintion from Dev central can be found here;

     

     

    http://devcentral.f5.com/Default.aspx?tabid=63&articleType=ArticleView&articleId=179

     

     

    I have not seen reselect to act like that in practice, at least on the 9.3.x branch of code..

     

     

    I haven't tried it with a statefull device/traffic, only to actual servers running common protocols, non persisted..

     

     

    It seems current connections are never reslected.

     

     

    The only true way to move current connections immediately off the server is to use "reject" and send a tcp reset to the host to force them to initate a new connection.. maybe this is fixed in newer code? I will retest soon when we move to 9.4.8/10.x

     

     

    Thanks!
  • Hello IRuleYou,

     

     

    actually I may be thinking of something slightly off topic - should a pool member be forced offline, the 'reselect' option "sends the stream to an alternate pool member. Only appropraite when load balancing routers or state-sharing firewalls."

     

     

    This is taken from the 9.4.x LTM manual.

     

     

    Apologies for any confusion!
  • sounds like we're talking about the same thing here... Is that from a class? I have the essentials and advanced manuals from the class, they just say V9 on the front... and it definitely doesn't read like that...

     

     

    I can't find your text in ask F5 either.. do you have a link you can share or is your manual from a class?

     

     

    none the less I will test in newer code and let everyone know how they react.

     

     

    If anyone has tested or has more info on the reselect, action on svr down, i'd love to hear about it. I've found it only reselects new connections, not current... and I'd love to be able to move connections not requiring persistence to an available pool member on a down event...

     

     

     

    Thanks!
  • Hi,

     

     

    the manual is LTM Essentials Guide 9.x, Tenth Edition (printed June 2008) and is a small section 5.4.6 at the end of Module 5 - Persistence, page 5-17. I believe it applies to existent connections whether persistent or not (it's at this section of the manual beceuse the options don't make any sense until after persistence has been discussed).

     

     

    Sadly, my experience really only covers 9.4 and beyond so I couldn't tell you for sure if this option or behaviour is the same in 9.3.x.

     

     

    I hope this helps!

     

     

  • gotcha.. I have the 7th edition and it doesn't mention it..

     

     

    Yep most likely if you need persistence your app wouldn't be able to handle "reselect"... but like I said, I haven't seen it working in 9.3.x as advertised... I'll retest soon in a newer code...

     

     

    Thanks!