Multiple strategies for maintenance page
Hello.
They are multiple ways to handle maintenance page, ie reply with a nice "Sorry, we're currently not available" alternative web page when a service is down.
The one we currently use is to add to every pool a last resort member, in group priority 0, hosting a web server configured to reply to any request with such a page, and an HTTP status 503. This strategy brings the following advantages:
- delegating content hosting on a standard web server, instead of using BigIP builtin features, is both easier for handling anything else as pure text, and allows to delegate content management to our colleagues, which don't have access to the BigIP itself
- transferring requests, instead of using an HTTP redirection, allows to keep this web server private (there is no direct access from outside), and to preserve the original Host header, allowing simple per-application content customization
- this mechanism is quite generic, and as it is pool-based, it can be applied as well to virtual server corresponding to a single application and to virtual servers corresponding to multiple applications, with a dispatch strategy based on Host header
This works perfectly. However, the disadvantage is that we're actually cheating with the high-availability feature of the BigIP: when a service is down, because all of its actual pool members are offline, the BigIP still has the last resort member active in the pool, and doesn't consider it down, which is not really critical, but blurs status reporting.
I tried an alternate strategy, using the following generic irule to transfer request to the same web server, ie:
when LB_FAILED {
if { [active_members [LB::server pool]] < 1 } {
log local0. "no target pool member available, ressort to fallback pool"
pool fallback_https_pool
}
}
However, despite the logs showing the irule is actually executed, the BigIP keep sending TCP reset to the client, with "no pool member available" as reason... What am I missing here ?
Of course, alternative proposals to achieve the same result are welcome.
Your irule is just missing the LB::reselect command. Pool on its own is not enough as the pool has already been selected and tried in the LB_FAILED event.
Some of my configuration requires the same; if the primary pool is down it is necessary to send the traffic to a standby server in a different pool. I do it without a lot of checks in the irule, using standard config instead.
The pool is configured to do a reselect to a different member if a x number of the other members are down.
Then the irule only includes the LB failed event:
when LB_FAILED { LB::reselect pool web-noname_standby_pool persist none }