Forum Discussion

MW1's avatar
MW1
Icon for Cirrus rankCirrus
Sep 06, 2010

Adding standby F5 to HA pair using network failover

All,

 

can anyone advise why when I add a standby F5 LTM to a active/standby HA F5 LTM setup using network fail over the interfaces on the active F5 go offline after the device "see each other" and then seems to cycle with the interfaces of the active or the passive going on & offline?

 

 

The devices were in a active/standby pair using a serial cable but had to switch to network due to the devices moving. It could be the connection between the two devices is not stable enough for HA heartbeat traffic, I'm just trying to find out if this would cause the interfaces to go offline.

 

 

Here is the log from the active F5 (192.168.52.131):

 

 

Sep 2 04:03:19 tmm tmm[933]: 01340001:3: HA Connection with peer 192.168.52.132:47998 established.

 

Sep 2 04:03:19 tmm tmm[933]: 01340001:3: HA Connection with peer 192.168.52.132:47998 established.

 

Sep 2 04:03:28 MQ1LTM01 sod[997]: 010c0025:5: Toggle from active to standby to active.

 

Sep 2 04:03:28 MQ1LTM01 sod[997]: 010c0025:5: Toggle from active to standby to active.

 

Sep 2 04:03:28 sccp bcm56xxd[22311]: 012c0015:6: Link: 1.2 is DOWN

 

Sep 2 04:03:28 sccp bcm56xxd[22311]: 012c0015:6: Link: 1.2 is DOWN

 

Sep 2 04:03:28 sccp bcm56xxd[22311]: 012c0015:6: Link: 1.1 is DOWN

 

Sep 2 04:03:28 sccp bcm56xxd[22311]: 012c0015:6: Link: 1.1 is DOWN

 

Sep 2 04:03:28 MQ1LTM01 lacpd[990]: 01160010:6: Link 1.1 removed from aggregation

 

Sep 2 04:03:28 MQ1LTM01 lacpd[990]: 01160010:6: Link 1.1 removed from aggregation

 

Sep 2 04:03:32 tmm tmm[933]: 01340002:3: HA Connection with peer 192.168.52.132:47998 lost.

 

Sep 2 04:03:32 tmm tmm[933]: 01340002:3: HA Connection with peer 192.168.52.132:47998 lost.

 

Sep 2 04:03:35 sccp bcm56xxd[22311]: 012c0015:6: Link: 1.2 is UP

 

Sep 2 04:03:35 sccp bcm56xxd[22311]: 012c0015:6: Link: 1.2 is UP

 

Sep 2 04:03:35 sccp bcm56xxd[22311]: 012c0015:6: Link: 1.1 is UP

 

Sep 2 04:03:35 sccp bcm56xxd[22311]: 012c0015:6: Link: 1.1 is UP

 

Sep 2 04:03:41 MQ1LTM01 lacpd[990]: 01160009:6: Link 1.1 added to aggregation

 

Sep 2 04:03:41 MQ1LTM01 lacpd[990]: 01160009:6: Link 1.1 added to aggregation

 

 

The log from the standby F5 (192.168.52.132):

 

Sep 2 04:02:10 tmm tmm[933]: 01340002:3: HA Connection with peer 192.168.52.131:1028 lost.

 

Sep 2 04:02:11 sccp bcm56xxd[221]: 012c0015:6: Link: 1.1 is UP

 

Sep 2 04:02:11 tmm tmm[933]: 01340002:3: HA Connection with peer 192.168.52.131:1028 lost.

 

Sep 2 04:02:11 sccp bcm56xxd[221]: 012c0015:6: Link: 1.2 is UP

 

Sep 2 04:02:11 MQ1LTM02 lacpd[990]: 01160009:6: Link 1.2 added to aggregation

 

Sep 2 04:02:17 MQ1LTM02 lacpd[990]: 01160009:6: Link 1.1 added to aggregation

 

Sep 2 04:03:19 tmm tmm[933]: 01340001:3: HA Connection with peer 192.168.52.131:1028 established.

 

Sep 2 04:03:31 sccp bcm56xxd[221]: 012c0015:6: Link: 1.2 is DOWN

 

Sep 2 04:03:31 MQ1LTM02 lacpd[990]: 01160010:6: Link 1.2 removed from aggregation

 

Sep 2 04:03:31 sccp bcm56xxd[221]: 012c0015:6: Link: 1.1 is DOWN

 

Sep 2 04:03:31 MQ1LTM02 lacpd[990]: 01160010:6: Link 1.1 removed from aggregation

 

Sep 2 04:03:32 tmm tmm[933]: 01340002:3: HA Connection with peer 192.168.52.131:1028 lost.

 

Sep 2 04:03:37 sccp bcm56xxd[221]: 012c0015:6: Link: 1.2 is UP

 

Sep 2 04:03:37 sccp bcm56xxd[221]: 012c0015:6: Link: 1.1 is UP

 

Sep 2 04:03:37 MQ1LTM02 lacpd[990]: 01160009:6: Link 1.2 added to aggregation

 

Sep 2 04:03:39 MQ1LTM02 sod[1000]: 010c0019:5: Active

 

Sep 2 04:03:40 MQ1LTM02 lacpd[990]: 01160009:6: Link 1.1 added to aggregation

 

 

thanks

 

 

7 Replies

  • What interfaces are you using for network failover? Just the management interface? Since the active is itself failing over, I imagine we're triggering one of our failsafe conditions, be it gateway, VLAN, or a service.

     

     

    What do you have defined as far as failsafe goes...vlans? gateway? ha-groups? etc...Also, are you using preferred redundancy state at all?
  • Thanks for the reply, the devices are still on the 9.X code so no HA groups, also there is no vlan or gateway failsafe config in place currently. The device is currently using the main network trunk for network failover, not the managment interface. I know this is not best practise and I'm awating an engineer to get to the DC to rig a dedicated interface for the use for network failover.

     

     

    I am using preferred redundancy state (active for the active 192.168.52.131 and standby for the other). I did read that the log msg: "Toggle from active to standby to active." was expected if there was no redundancy state defined, but as I have I don't know if this is pointing to an issue. Do you know of any default fail safe conditions on 9.X code tat would cause the interfaces to go offline?

     

     

    thanks

     

     

  • Looking at one of my 9.x boxes, I only see "Restart Service," "Restart All," and "Fail Over and Restart." Can you verify your trunks on each F5? I imagine you have LACP with spanning tree in pass-through for those interfaces? I've had boxes that go from active to standby to active all in one log entry but can't remember why that was...if you have a support contract, they might be able to shed some light there.
  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    A system going active/standby/active is usually because of a detection of an active/active situation. That listtle toggle of active/standby/active will ensure that any ARP caches are updated with the correct info due to the gratuitous ARP's that are done as a system goes active.

     

     

    On the subject of 'Preferred Active'. I'd lose it. It doesn't really work very well in my experience. In fact there's at least two scenarios that it causes problems (i.e. The 'preferred active' box always comes up active and then you get an active/active situation. Stuff like that...

     

     

    In v9 if you have network failover, there's only 1 network that can be used for the HA heartbeat... Make sure you don't have problems with that link, and I would generally advise that it's on a dedicated point-to-point network (Without or without switches. But only use switches if the two boxes are out of reach with a single cable). V10 you can have multiple HB networks setup. MUCH nicer and MUCH more stable... Run (don't walk) to upgrade to v10 just for that reason IMO.

     

     

    Hamish

     

  • Hi All,

     

     

    some thing related to network fail-over , as i am having 2 LTM 3600 with V11.2,

     

     

    i want to configure for network fail-over in our network of different sites,one device in one site and other in second site.

     

     

    can you please help me , how should i configure them and hoe to test fail-over conditions, is there should be always connection between 2 devices to act in active / standby state...how to achieve in this case
  • can you please help me , how should i configure them and hoe to test fail-over conditions, is there should be always connection between 2 devices to act in active / standby state...how to achieve in this caseyou have read redundant configuration guide, haven't you? is there any specific topic there you do not understand or it is not clear? it might be easier to assist.

     

     

    Manual: BIG-IP Redundant Systems Configuration Guide

     

    http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/tmos-redundant-systems-config-11-2-0.html
  • Similar issues with me " HA Connection with peer x:x lost." and "unable to get peer local time" appear in log ltm. with serial cable failover

     

     

    what can I do? (v. 9.3.1)