Forum Discussion

DarioGB_339840's avatar
DarioGB_339840
Icon for Altostratus rankAltostratus
Apr 16, 2019

Active-Active Colision

Hello.

I had a cluster with this initial state:

  • BIG-IP1 -> Active
  • BIG-IP2 -> Standby

Because of a power outage on BIG-IP2's datacenter, this device was rebooted and the communication between both devices was broken during a few minutes.

For this reason, both devices hadn't detected messages from the the far end and the status was established as disconnected during a gap of time.

From the BIG-IP1 perception:

  • BIG-IP1 -> Active
  • BIG-IP2 -> Disconnected

From the BIG-IP2 perception:

  • BIG-IP1 -> Disconnected
  • BIG-IP2 -> Active

When the communication between both devices was restablished, BIG-IP1 became Standby in favor of the other device:

Apr 15 05:23:08 slot1/BIG-IP1 notice sod[5827]: 010c007e:5: Not receiving status updates from peer device /Common/BIG-IP2.mydomain.local (10.0.0.2) (Disconnected).
Apr 15 05:41:38 slot1/BIG-IP1 warning sod[5827]: 010c0084:4: Failover status message received after 1111.500 second gap, from device /Common/BIG-IP2.mydomain.local (10.0.0.2) (unicast: -> 10.255.1.209).
Apr 15 05:41:38 slot1/BIG-IP1 notice sod[5827]: 010c007f:5: Receiving status updates from peer device /Common/BIG-IP2.mydomain.local (10.0.0.2) (Online).
Apr 15 05:41:41 slot1/BIG-IP1 notice sod[5827]: 010c004a:5: Leaving active in favor of active peer.
Apr 15 05:41:41 slot1/BIG-IP1 notice sod[5827]: 010c0052:5: Standby for traffic group /Common/traffic-group-1.
Apr 15 05:41:41 slot1/BIG-IP1 notice sod[5827]: 010c0018:5: Standby

I would like to know what criteria was adopted to decide what device leaves their active state in favor of another.

By the way, both devices are working as Load Aware with default values and only one traffic-group.

--------------------------------------------------------------------------------------------------------------------------------------------
CM::Traffic-Group       
Name                      Device                  Status   Next    Load  Next Active  HA Group  Times Became  Last Became
                                                           Active        Load                   Active        Active
--------------------------------------------------------------------------------------------------------------------------------------------
traffic-group-1           BIG-IP1.mydomain.local  standby  true    -     1            -         3             2019-Apr-15 05:41:35
traffic-group-1           BIG-IP2.mydomain.local  active   false   1     -            -         2             2019-Apr-15 05:39:50
traffic-group-local-only  - 

Thanks in advance.

KR, Dario.

1 Reply

  • First recommend looking over the following K95002127: Troubleshooting BIG-IP failover events.

    I tested a similar failover/failback event years ago for a customer (overly worried about a split-brain event occurring) and found if the configuration is in sync then it has to do with the base MAC address of each device.

    tmsh show sys hardware | grep -i "base mac"

    I think the lower the value the higher the priority to resolve a traffic-group active/active conflict.

    Again this was a long time ago and got the info from F5 Support.