Forum Discussion

Guillaume_Rouss's avatar
Guillaume_Rouss
Icon for Nimbostratus rankNimbostratus
Jul 31, 2019
Solved

Multiple strategies for maintenance page

Hello.

They are multiple ways to handle maintenance page, ie reply with a nice "Sorry, we're currently not available" alternative web page when a service is down.

The one we currently use is to add to every pool a last resort member, in group priority 0, hosting a web server configured to reply to any request with such a page, and an HTTP status 503. This strategy brings the following advantages:

  • delegating content hosting on a standard web server, instead of using BigIP builtin features, is both easier for handling anything else as pure text, and allows to delegate content management to our colleagues, which don't have access to the BigIP itself
  • transferring requests, instead of using an HTTP redirection, allows to keep this web server private (there is no direct access from outside), and to preserve the original Host header, allowing simple per-application content customization
  • this mechanism is quite generic, and as it is pool-based, it can be applied as well to virtual server corresponding to a single application and to virtual servers corresponding to multiple applications, with a dispatch strategy based on Host header

This works perfectly. However, the disadvantage is that we're actually cheating with the high-availability feature of the BigIP: when a service is down, because all of its actual pool members are offline, the BigIP still has the last resort member active in the pool, and doesn't consider it down, which is not really critical, but blurs status reporting.

I tried an alternate strategy, using the following generic irule to transfer request to the same web server, ie:

when LB_FAILED {
   if { [active_members [LB::server pool]] < 1 } {
       log local0. "no target pool member available, ressort to fallback pool"
       pool fallback_https_pool
   } 
}

However, despite the logs showing the irule is actually executed, the BigIP keep sending TCP reset to the client, with "no pool member available" as reason... What am I missing here ?

Of course, alternative proposals to achieve the same result are welcome.

  • Your irule is just missing the LB::reselect command. Pool on its own is not enough as the pool has already been selected and tried in the LB_FAILED event.

    Some of my configuration requires the same; if the primary pool is down it is necessary to send the traffic to a standby server in a different pool. I do it without a lot of checks in the irule, using standard config instead.

    The pool is configured to do a reselect to a different member if a x number of the other members are down.

    Then the irule only includes the LB failed event:

    when LB_FAILED {
        LB::reselect pool web-noname_standby_pool
        persist none
    }

6 Replies

  • Hello,

    Maybe you are trying that in the wrong event.

    Is better to check members status before send to pool, for example in HTTP_REQUEST event, right?

    I changed some parts to retry on default pool before sent to the backup and turns back when default pool comes up again.  It is just an idea.

    when CLIENT_ACCEPTED {
        set default_pool [LB::server pool]
        set _retry 0
    }
    when HTTP_REQUEST {
       if { [active_members $default_pool] > 0 } {
            pool $default_pool
       } else {
            log local0. "no target pool member available, ressort to fallback pool"
            pool fallback_https_pool
       }
    }
    when LB_SELECTED {
        log local0. "the pool and member [LB::server] has selected..."
    }
    when LB_FAILED {
        if { [incr _retry] <= 3 } {
            log local0. "the pool member [LB::server addr] has failed, trying another ($_retry)..."
            LB::mode rr
            LB::reselect
        }
    }
    when SERVER_CONNECTED {
        set _retry 0
    }

    Regards.

  • JG's avatar
    JG
    Icon for Cumulonimbus rankCumulonimbus

    How is the "status reporting" generated, and what is it used for?

  • Your irule is just missing the LB::reselect command. Pool on its own is not enough as the pool has already been selected and tried in the LB_FAILED event.

    Some of my configuration requires the same; if the primary pool is down it is necessary to send the traffic to a standby server in a different pool. I do it without a lot of checks in the irule, using standard config instead.

    The pool is configured to do a reselect to a different member if a x number of the other members are down.

    Then the irule only includes the LB failed event:

    when LB_FAILED {
        LB::reselect pool web-noname_standby_pool
        persist none
    }
  • : indeed, that was the issue. Thanks a lot !

     

    : i was just referring to standard pool status, I should probably have said "status tracking". The loss of the last member of a pool triggers an explicit message "No members available for pool xyz" in logs, whereas just switching to the lowest priority group doesn't.

     

    : is quite simpler 🙂