Forum Discussion

Fabian_Arroyo_M's avatar
Fabian_Arroyo_M
Icon for Nimbostratus rankNimbostratus
Jan 14, 2015

HTTP monitor_ Redundant pair member LTM Marks Resources as unavailable

Hi all,

 

We are having a problem with one of our LTMs that resides in a redundant pair.

 

We weree called in to help the customer because they lost conectivity to their exchange environment.

 

We went onsite and we checked the active device and found that the pool members were flapping, on the gui we couls see that the pool members are marked up and then down every 2 to 4 seconds. We verified the log and we also can see the flapping on /var/log/ltm.

 

We verified the Pasive box and found that all the pool members were up and that there was no flappling from this device and we took a look at the log /var/log/ltm and nothing budged while the other box was flapping constantly.

 

We executed a force to standby on the active box and it failed over correctly to the other box and all of the services started to work normally.

 

Once the services were back online we took a look at the configuration on the affected box but everything looked good.

 

We executed an extended ping and telnets to the services which all worked , we verified the log and we could not see any interface flapping.

 

We double checked our monitors and we saw that we had the exact monitors that the other box had but the problem persisted.

 

We upgraded the box to 11.3.0 HF 10 but the problem persisted as we booted on to the new partition.

 

We decied to look further onto the networking side of problem , these boxes are implemented using a Trunk and Vlan Tagging which has worked from the beggining.

 

We decided to undo the trunk and the vlan tagging and moved the ports to the respective vlan plus we had the network team change the Nexus config to have the ports changed from the Virtual Port Channel Mode to Access mode.

 

Once all this was done we tried reaching the servers via ping and the test was successful , but the pool members are still flapping. We never lost a packet via ping and we could also do telnets tho the specific pool member ports.

 

Since this did not work , we decided to upgrade the box to 11.5.1 HF 7 but the problems persisted.

 

Our last shot was changing the interface that was on the server vlan, we changed interface 1.4 to 1.3 fisically but the problem still persisted.

 

I have run out of possible troubleshooting steps to follow , i would appreciate your help and guidance in order to solve this problem.

 

This is part of the log: Jan 12 09:19:15 f5-1 notice mcpd[9575]: 01071432:5: CMI peer connection established to 172.16.8.74 port 6699 Jan 12 09:19:18 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status down. [ was up for 0hr:0min:20sec ] Jan 12 09:19:18 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002 Jan 12 09:19:18 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002 Jan 12 09:19:20 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_webservers_80 member /Common/192.168.1.7:80 monitor status up. [ was down for 0hr:0min:45sec ] Jan 12 09:19:20 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_webservers_80 now has available members Jan 12 09:19:20 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_webservers_80 now has available members Jan 12 09:19:20 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: disabled, all peers are connected Jan 12 09:19:23 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status up. [ was down for 0hr:0min:5sec ] Jan 12 09:19:23 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_8002 now has available members Jan 12 09:19:23 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_8002 now has available members Jan 12 09:19:27 f5-1 err tmm[12676]: 01340002:3: HA Connection with peer 172.16.8.74:1028 lost. Jan 12 09:19:27 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status up. [ was down for 0hr:0min:25sec ] Jan 12 09:19:27 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members Jan 12 09:19:27 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members Jan 12 09:19:28 f5-1 notice tmm[12676]: 01340001:5: HA Connection with peer 172.16.8.74:1028 established. Jan 12 09:19:40 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_webservers_80 member /Common/192.168.1.7:80 monitor status down. [ was up for 0hr:0min:20sec ] Jan 12 09:19:40 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_webservers_80 Jan 12 09:19:40 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_webservers_80 Jan 12 09:19:43 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status down. [ was up for 0hr:0min:20sec ] Jan 12 09:19:43 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002 Jan 12 09:19:43 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002 Jan 12 09:19:50 f5-1 notice mcpd[9575]: 0107143c:5: Connection to CMI peer 172.16.8.74 has been removed Jan 12 09:19:50 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: enabled Jan 12 09:19:50 f5-1 notice mcpd[9575]: 01071431:5: Attempting to connect to CMI peer 172.16.8.74 port 6699 Jan 12 09:19:50 f5-1 notice mcpd[9575]: 01071432:5: CMI peer connection established to 172.16.8.74 port 6699 Jan 12 09:19:55 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: disabled, all peers are connected Jan 12 09:20:32 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status down. [ was up for 0hr:1min:5sec ] Jan 12 09:20:32 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_80 Jan 12 09:20:32 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_80 Jan 12 09:20:38 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status up. [ was down for 0hr:0min:6sec ] Jan 12 09:20:38 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members Jan 12 09:20:38 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members Jan 12 09:20:40 f5-1 notice mcpd[9575]: 01070410:5: Removed subscription with subscriber id qkview

 

2 Replies

  • What type of monitor? Can you post the monitor configuration?

     

    I'd imagine this isn't the F5 but something else 'in path' or the servers themselves. Do they have rate limits, HIPS, a firewall etc. Perhaps some software there decided it didn't like the frequent connections from the primary?

     

    Have you checked the next hop switch and/or router? Any other elements specific to the primary?

     

  • The monitor configuration is this:

     

    ltm monitor http /Common/Exchange_2010_Prod.app/Exchange_2010_Prod_ad_http_monitor { app-service /Common/Exchange_2010_Prod.app/Exchange_2010_Prod defaults-from /Common/http description none destination : interval 10 manual-resume disabled partition Common password none recv none recv-disable none reverse disabled send "GET / HTTP/1.1\r\nHost: mail.cajadeande.fi.cr\r\nConnection: Close\r\n\r\n" time-until-up 0 timeout 31 transparent disabled up-interval 0 username none }

     

    But we change for a default HTTP monitor and the problem persist.

     

    On the other hand, the Pasive box works perfect. We executed a force to standby on the active box and it failed over correctly to the other box and all of the services started to work normally. Then we discard network problems.