Forum Discussion

JTrimble's avatar
JTrimble
Icon for Nimbostratus rankNimbostratus
Feb 10, 2011

High Availability HA Group not failing over

Hi there,

 

 

We have two F5 BigIP 3600's paired together in a high availability configuration. They are connected via serial port and we have two trunks set up with 2 members each. I have set up the HA Group on each F5 so that each trunk has a weight of 10 and an Active Bonus of 5. My theory is that if any one trunk member goes down, it will still function but if at least 2 trunk members go down (threshold is set at 1) that it should fail over.

 

 

 

We are not getting fail over if one member of each trunk is down. I see the scoring on each and the active server has a score of 15 and the standby server has a score of 20, yet there is no failover happening.

 

 

 

Active:

 

root@LBNCRLTM01(Active)(tmos.sys) show ha-group TEST detail

 

 

 

Sys::HA Group: TEST

 

---------------------

 

State enabled

 

Active Bonus 5

 

Score 15

 

 

 

Sys::HA Group Trunk: TEST:LTM01-LACP

 

------------------------------------

 

Threshold 1

 

Percent Up 50

 

Weight 10

 

Score Contribution 5

 

 

 

Sys::HA Group Trunk: TEST:LTM01-LACP-SRV

 

----------------------------------------

 

Threshold 1

 

Percent Up 50

 

Weight 10

 

Score Contribution 5

 

 

 

 

Standby:

 

root@LBNCRLTM02(Standby)(tmos.sys) show ha-group TEST detail

 

 

 

Sys::HA Group: TEST

 

---------------------

 

State enabled

 

Active Bonus 5

 

Score 20

 

 

 

Sys::HA Group Trunk: TEST:LTM02-LACP

 

------------------------------------

 

Threshold 1

 

Percent Up 100

 

Weight 10

 

Score Contribution 10

 

 

 

Sys::HA Group Trunk: TEST:SRV-LACP

 

----------------------------------

 

Threshold 1

 

Percent Up 100

 

Weight 10

 

Score Contribution 10

 

 

 

 

Here are the setups on each F5:

 

 

 

Active:

 

root@LBNCRLTM01(Active)(tmos.sys) list ha-group

 

sys ha-group TEST {

 

active-bonus 5

 

trunks {

 

LTM01-LACP {

 

percent-up 50

 

threshold 1

 

weight 10

 

}

 

LTM01-LACP-SRV {

 

percent-up 50

 

threshold 1

 

weight 10

 

}

 

}

 

}

 

 

 

 

Standby:

 

root@LBNCRLTM02(Standby)(tmos.sys) list ha-group

 

sys ha-group TEST {

 

active-bonus 5

 

trunks {

 

LTM02-LACP {

 

percent-up 100

 

threshold 1

 

weight 10

 

}

 

SRV-LACP {

 

percent-up 100

 

threshold 1

 

weight 10

 

}

 

}

 

}

 

 

 

 

 

 

Any ideas as to why this isn't failing over?

 

 

 

Thanks,

 

 

 

Jason

 

6 Replies

  • If I'm reading the documentation correctly, I would expect the threshold setting to be unique to each trunk. If you're set at 1, and each trunk still has 1 member left, I wouldn't expect a fail-over to happen. Also, the "Weight" setting seems interesting. According to documentation, "The sum of the weights in the HA group must equal 100." I'm going to read a bit more.
  • I just read this:

     

     

    "A health score is based on the number of members that are currently available for any trunks, pools, and clusters in the HA group, combined with a weight that you assign to each trunk, pool, and cluster. The unit that has the best overall score at any given time becomes or remains the active unit."

     

     

    In your case, the standby has a higher score, so I'd expect a fail-over to happen. The only thing that comes to mind is that because the weights don't add up to 100, the scores might be being ignored. Might be worth a support case but you could test it by simply adjusting the weights from 10/10 to 50/50.
  • Hello guys,

     

     

    I'm experiencing a similar issue. We have 4 LB available on a LAB environment, two of them are running HA config (active/passive) with network failover only. The other two are running HA config (active/passive) with serial cable failover. I have configured the HA group feature on both on them in order to trigger a failover when trunk interfaces become unavailable...

     

     

    The HA group feature is working fine on the network failover pair, but it doesn't work at all on the serial failover pair. I must say I'm surprised because I was going thru the F5 docs on HA groups and AFAIK there is no reference to the fact that network failover is needed in order for HA groups to work properly...

     

     

    Any comments/experience with that?

     

     

    Thanx!
  • neo - can you create a support ticket and also paste in your config here? I don't use network failover anywhere but would certainly use HA Groups so I don't expect this to be a requirement.
  • I'm currently working with a F5 engineer to try to isolate the root cause of this behavior. I have done some tests on a LAB environment and I think HA groups feature requires network failover to be in place, and possibly serial failover cable to be removed. If that's the case, then we could argue that the documentation needs to be reviewed and updated to clearly state this.

     

     

    I'll keep you posted once the case is closed.
    • IanB's avatar
      IanB
      Icon for Employee rankEmployee

      Just to add a very late reply here, HA groups absolutely require network failover to be enabled - this is what causes the BigIP to send udp failover packets containing the current state of the box. The serial failover is very simplistic and carries no state information - it's literally only indicating that the other box is powered on and active.