HTTP monitor_ Redundant pair member LTM Marks Resources as unavailable
Hi all,
We are having a problem with one of our LTMs that resides in a redundant pair.
We weree called in to help the customer because they lost conectivity to their exchange environment.
We went onsite and we checked the active device and found that the pool members were flapping, on the gui we couls see that the pool members are marked up and then down every 2 to 4 seconds. We verified the log and we also can see the flapping on /var/log/ltm.
We verified the Pasive box and found that all the pool members were up and that there was no flappling from this device and we took a look at the log /var/log/ltm and nothing budged while the other box was flapping constantly.
We executed a force to standby on the active box and it failed over correctly to the other box and all of the services started to work normally.
Once the services were back online we took a look at the configuration on the affected box but everything looked good.
We executed an extended ping and telnets to the services which all worked , we verified the log and we could not see any interface flapping.
We double checked our monitors and we saw that we had the exact monitors that the other box had but the problem persisted.
We upgraded the box to 11.3.0 HF 10 but the problem persisted as we booted on to the new partition.
We decied to look further onto the networking side of problem , these boxes are implemented using a Trunk and Vlan Tagging which has worked from the beggining.
We decided to undo the trunk and the vlan tagging and moved the ports to the respective vlan plus we had the network team change the Nexus config to have the ports changed from the Virtual Port Channel Mode to Access mode.
Once all this was done we tried reaching the servers via ping and the test was successful , but the pool members are still flapping. We never lost a packet via ping and we could also do telnets tho the specific pool member ports.
Since this did not work , we decided to upgrade the box to 11.5.1 HF 7 but the problems persisted.
Our last shot was changing the interface that was on the server vlan, we changed interface 1.4 to 1.3 fisically but the problem still persisted.
I have run out of possible troubleshooting steps to follow , i would appreciate your help and guidance in order to solve this problem.
This is part of the log: Jan 12 09:19:15 f5-1 notice mcpd[9575]: 01071432:5: CMI peer connection established to 172.16.8.74 port 6699 Jan 12 09:19:18 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status down. [ was up for 0hr:0min:20sec ] Jan 12 09:19:18 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002 Jan 12 09:19:18 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002 Jan 12 09:19:20 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_webservers_80 member /Common/192.168.1.7:80 monitor status up. [ was down for 0hr:0min:45sec ] Jan 12 09:19:20 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_webservers_80 now has available members Jan 12 09:19:20 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_webservers_80 now has available members Jan 12 09:19:20 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: disabled, all peers are connected Jan 12 09:19:23 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status up. [ was down for 0hr:0min:5sec ] Jan 12 09:19:23 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_8002 now has available members Jan 12 09:19:23 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_8002 now has available members Jan 12 09:19:27 f5-1 err tmm[12676]: 01340002:3: HA Connection with peer 172.16.8.74:1028 lost. Jan 12 09:19:27 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status up. [ was down for 0hr:0min:25sec ] Jan 12 09:19:27 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members Jan 12 09:19:27 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members Jan 12 09:19:28 f5-1 notice tmm[12676]: 01340001:5: HA Connection with peer 172.16.8.74:1028 established. Jan 12 09:19:40 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_webservers_80 member /Common/192.168.1.7:80 monitor status down. [ was up for 0hr:0min:20sec ] Jan 12 09:19:40 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_webservers_80 Jan 12 09:19:40 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_webservers_80 Jan 12 09:19:43 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status down. [ was up for 0hr:0min:20sec ] Jan 12 09:19:43 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002 Jan 12 09:19:43 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002 Jan 12 09:19:50 f5-1 notice mcpd[9575]: 0107143c:5: Connection to CMI peer 172.16.8.74 has been removed Jan 12 09:19:50 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: enabled Jan 12 09:19:50 f5-1 notice mcpd[9575]: 01071431:5: Attempting to connect to CMI peer 172.16.8.74 port 6699 Jan 12 09:19:50 f5-1 notice mcpd[9575]: 01071432:5: CMI peer connection established to 172.16.8.74 port 6699 Jan 12 09:19:55 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: disabled, all peers are connected Jan 12 09:20:32 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status down. [ was up for 0hr:1min:5sec ] Jan 12 09:20:32 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_80 Jan 12 09:20:32 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_80 Jan 12 09:20:38 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status up. [ was down for 0hr:0min:6sec ] Jan 12 09:20:38 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members Jan 12 09:20:38 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members Jan 12 09:20:40 f5-1 notice mcpd[9575]: 01070410:5: Removed subscription with subscriber id qkview