Forum Discussion

Rajaraman_12066's avatar
Rajaraman_12066
Icon for Nimbostratus rankNimbostratus
Jan 07, 2016
Solved

BIG-IP HA over WAN - Is it a recommended practice?

Hi We plan to setup 2* F5 one each in DC1 and DC2. I plan to run HA pair between them. DCs are 6 kms apart and connected by dedicated fibre link. Can someone please advise?

 

  • It's not explicitly discouraged as far as I know, you just need to be aware of potential latency issues. Since network failover works by heartbeats sent across the configured failover link, you have to beware that packet loss or excessive delays may trigger unwanted failovers, and even potentially "split-brain" scenarios, where both devices think for brief moments that their peer has gone down, and so assume active role (active-active).

    There is a DB key that allows to configure how long the system waits before declaring it's peer dead:

    failover.nettimeoutsec
    . Default value is 3 seconds, and this assumes devices in the same data centre. You might want to tweak this a little higher to account for the increased distance. With the value set at 3 seconds, it will take 3 seconds to trigger a failover if the peer is truly down. This means at most 3 seconds of no device actively able to handle traffic. Raising this higher increases this window, but also helps mitigate the risk of split-brain due to transient network issues.

3 Replies

  • BinaryCanary_19's avatar
    BinaryCanary_19
    Historic F5 Account

    It's not explicitly discouraged as far as I know, you just need to be aware of potential latency issues. Since network failover works by heartbeats sent across the configured failover link, you have to beware that packet loss or excessive delays may trigger unwanted failovers, and even potentially "split-brain" scenarios, where both devices think for brief moments that their peer has gone down, and so assume active role (active-active).

    There is a DB key that allows to configure how long the system waits before declaring it's peer dead:

    failover.nettimeoutsec
    . Default value is 3 seconds, and this assumes devices in the same data centre. You might want to tweak this a little higher to account for the increased distance. With the value set at 3 seconds, it will take 3 seconds to trigger a failover if the peer is truly down. This means at most 3 seconds of no device actively able to handle traffic. Raising this higher increases this window, but also helps mitigate the risk of split-brain due to transient network issues.

    • Mr__Katic_15215's avatar
      Mr__Katic_15215
      Icon for Altocumulus rankAltocumulus
      Just one additional comment. You cannot use dedicated fail-over port builtin in BIG IP hardware. So if you want to have HA redundancy on HA Pair link used for heartbeat you will need to dedicate another fiber for that.