Forum Discussion

tarac_37545's avatar
tarac_37545
Icon for Nimbostratus rankNimbostratus
Jan 14, 2013

Monitoring Node State (Enabled/Disabled)

First posting so if I am missing standard info please advise.

 

 

Environment & Requirements

 

Single LTM

 

Two "stacks" of servers - each stack has two physical windows hosts

 

Server 1 - One "application" with two processes listening on different ports. The port 80 process is hosting two different sites (host header) so this host has a total of three virtual servers associated with it.

 

Server 2 - Two "applications" with two processes listening on different ports per process so this host has a total of four virtual servers associated with it.

 

The processes do not communicate with each other directly, not even the ones on the same host - they communicate with the virtual server on the LTM.

 

If a single process becomes unavailable the entire stack has to fail over to the second stack (a second pair of hosts with the same configuration)

 

The nodes in stack1 cannot talk to the nodes in stack2 - customer requirement

 

One of the seven services cannot be "monitored to death" - in fact the vendor states the service can only receive one monitor event every 30 seconds.

 

Two of the services cannot be in the same pool - vendor requirement

 

 

Solution (to date)

 

We created:

 

Seven virtual servers (the host header based sites each got their own IPs for non-technical reasons)

 

Seven pools - one per actual "process" and a seventh one to manage the stack failover

 

Six IP specific monitors to monitor the process availability of the processes in stack1. All six monitors are assigned to a single pool which has the Availability Requirement set to All.

 

Seven iRules (example below) to force pool selection to stack2 if the "global pool" is down

 

 

 

when CLIENT_ACCEPTED {

 

if {[active_members qa-pool-stack1] < 1}{ qa-pool-stack1 is the name of the pool with all the monitors in it

 

pool qa-s2-ae2-p80 The pool name is different for each irule - corresponds to the pool servicing the second stack for that service

 

}

 

}

 

This works exactly as designed. If any of the processes are shutdown the "stack pool" goes offline and all traffic is rerouted to stack2

 

 

Problem

 

The only catch we have run into so far is that if one of the nodes is disabled in the LTM but the services are not actually shutdown the LTM doesn't failover to stack2. This makes sense because none of the monitors have failed because they are not associated with the availabiltity of the node in the LTM they are checking the actual availability of the service on the host.

 

We've tried using the LB:status but it is not available at the CLIENT_ACCEPTED scope. We attempted to implement it at the HTTP_REQUEST scope and used it to mark a pool member down when a node get's disabled (which works) but we haven't sorted out how to get the pool re-enabled when the node is re-enabled. We didn't really want to evaluate both nodes, two times for every connection in seven virtual servers.

 

 

when HTTP_REQUEST {

 

if {[LB::status node 10.0.1.2] ne "up" } {

 

LB::down pool qa-s1-ae1-p80 member 10.0.1.2 80

 

}

 

}

 

What we were really hoping to find is a "monitor" that we can add to the stack-pool that will go down when a node is set to disabled and up when it is enabled.

 

Anyone know if there is something like that out there - or another way to achieve this goal of monitoring node status?

 

Thanks

 

Adam

 

 

 

 

4 Replies

  • Arie's avatar
    Arie
    Icon for Altostratus rankAltostratus
    A maximum of one HTTP-request every 30 seconds for monitoring per the vendor? That's a new one...

     

     

    There are some problems with this. Assuming that the application will indeed fail if requests are made more frequently, what happens if people actually start using it? Secondly, generally a node is marked down if three subsequent requests fail, but in your case that would mean that it won't be marked down for at least 90 seconds once it has failed.

     

     

    Given this limitation it would seem that the only way to monitor this app and not bring it down (per the specs from the vendor) is to use passive monitoring (i.e. use an iRule to ensure that the server response (based on a client request) is valid and mark down nodes via the iRule.

     

     

  • Is persistence involved here? If so, I'd assume failover and failback isn't going to be too effective although OneConnect might help.
  • Shouldn't this: ' if {[active_members qa-pool-stack1] < 1}' be ' if {[active_members qa-pool-stack1] < 7}'?

     

  • Thanks for the replies! I thought my browser had crashed and the post hadn't actually worked - haven't figured out how to find "my posts" yet.

     

     

    I'll look into passive monitoring. Yea - that monitoring requirement seemed like integrator nonsense to me but I don't get to make the decision to call them on it.

     

     

    Persistence is not involved on most of the virtual servers but there is a cookie profile configured on two of them.

     

     

    I thought active_members checked the number of nodes in the pool not the number of monitors - is that inaccurate?