HTTP monitor_ Redundant pair member LTM Marks Resources as unavailable

Question

Hi all,&nbsp;
We are having a problem with one  of our LTMs that resides in a redundant pair. &nbsp;
We weree called in to help the customer because they lost conectivity to their exchange environment.&nbsp;
We went onsite and we checked the active device and  found that the pool members were flapping, on the gui we couls see that the pool members are marked up and then down every 2 to 4 seconds.  We verified the log and we also can see the flapping on /var/log/ltm.&nbsp;
We verified the Pasive box and found that all the pool members were up and that there was no flappling from this device and we took a look at the log /var/log/ltm and nothing budged while the other box was flapping constantly.&nbsp;
We executed a force to standby on the active box and it failed over correctly to the other box and all of the services started to work normally. &nbsp;
Once the services were back online we took a look at the configuration on the affected box but everything looked good. &nbsp;
We executed an extended ping and telnets to the services which all worked , we verified the log and we could not see any interface flapping.&nbsp;
We double checked our monitors and we saw that we had the exact monitors that the other box had but the problem persisted.&nbsp;
We upgraded the box to 11.3.0 HF 10 but the problem persisted as we booted on to the new partition. &nbsp;
We decied to look further onto the networking side of problem , these boxes are implemented using a Trunk and Vlan Tagging which has worked from the beggining. &nbsp;
We decided to undo the trunk and the vlan tagging and moved the ports to the respective vlan plus we had the network team change the Nexus config to have the ports changed from the Virtual Port Channel Mode to  Access mode. &nbsp;
Once all this was done we tried reaching the servers via ping and the test was successful , but the pool members are still flapping.  We never lost a packet via ping and we could also do telnets tho the specific pool member ports.&nbsp;
Since this did not work , we decided to upgrade the box to 11.5.1 HF 7 but the problems persisted. &nbsp;
Our last shot was changing the interface that was on the server vlan, we changed interface 1.4 to 1.3 fisically but the problem still persisted. &nbsp;
I have run out of possible troubleshooting steps to follow , i would appreciate your help and guidance in order to solve this problem. &nbsp;
This is part of the log:
Jan 12 09:19:15 f5-1 notice mcpd[9575]: 01071432:5: CMI peer connection established to 172.16.8.74 port 6699
Jan 12 09:19:18 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status down. [ was up for 0hr:0min:20sec ]
Jan 12 09:19:18 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002
Jan 12 09:19:18 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002
Jan 12 09:19:20 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_webservers_80 member /Common/192.168.1.7:80 monitor status up. [ was down for 0hr:0min:45sec ]
Jan 12 09:19:20 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_webservers_80 now has available members
Jan 12 09:19:20 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_webservers_80 now has available members
Jan 12 09:19:20 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: disabled, all peers are connected
Jan 12 09:19:23 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status up. [ was down for 0hr:0min:5sec ]
Jan 12 09:19:23 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_8002 now has available members
Jan 12 09:19:23 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_8002 now has available members
Jan 12 09:19:27 f5-1 err tmm[12676]: 01340002:3: HA Connection with peer 172.16.8.74:1028 lost.
Jan 12 09:19:27 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status up. [ was down for 0hr:0min:25sec ]
Jan 12 09:19:27 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members
Jan 12 09:19:27 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members
Jan 12 09:19:28 f5-1 notice tmm[12676]: 01340001:5: HA Connection with peer 172.16.8.74:1028 established.
Jan 12 09:19:40 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_webservers_80 member /Common/192.168.1.7:80 monitor status down. [ was up for 0hr:0min:20sec ]
Jan 12 09:19:40 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_webservers_80
Jan 12 09:19:40 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_webservers_80
Jan 12 09:19:43 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_8002 member /Common/172.16.8.154:8002 monitor status down. [ was up for 0hr:0min:20sec ]
Jan 12 09:19:43 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002
Jan 12 09:19:43 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_8002
Jan 12 09:19:50 f5-1 notice mcpd[9575]: 0107143c:5: Connection to CMI peer 172.16.8.74 has been removed
Jan 12 09:19:50 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: enabled
Jan 12 09:19:50 f5-1 notice mcpd[9575]: 01071431:5: Attempting to connect to CMI peer 172.16.8.74 port 6699
Jan 12 09:19:50 f5-1 notice mcpd[9575]: 01071432:5: CMI peer connection established to 172.16.8.74 port 6699
Jan 12 09:19:55 f5-1 notice mcpd[9575]: 0107143a:5: CMI reconnect timer: disabled, all peers are connected
Jan 12 09:20:32 f5-1 notice mcpd[9575]: 01070638:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status down. [ was up for 0hr:1min:5sec ]
Jan 12 09:20:32 f5-1 err tmm[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_80
Jan 12 09:20:32 f5-1 err tmm1[12676]: 01010028:3: No members available for pool /Common/pl_applcgp_80
Jan 12 09:20:38 f5-1 notice mcpd[9575]: 01070727:5: Pool /Common/pl_applcgp_80 member /Common/172.16.8.154:80 monitor status up. [ was down for 0hr:0min:6sec ]
Jan 12 09:20:38 f5-1 err tmm1[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members
Jan 12 09:20:38 f5-1 err tmm[12676]: 01010221:3: Pool /Common/pl_applcgp_80 now has available members
Jan 12 09:20:40 f5-1 notice mcpd[9575]: 01070410:5: Removed subscription with subscriber id qkview&nbsp;

what_lies_bene1 · Answer

What type of monitor? Can you post the monitor configuration?&nbsp;
I'd imagine this isn't the F5 but something else 'in path' or the servers themselves. Do they have rate limits, HIPS, a firewall etc. Perhaps some software there decided it didn't like the frequent connections from the primary?&nbsp;
Have you checked the next hop switch and/or router? Any other elements specific to the primary?&nbsp;

fabian_arroyo_m · Answer

The monitor configuration is this:&nbsp;
ltm monitor http /Common/Exchange_2010_Prod.app/Exchange_2010_Prod_ad_http_monitor {
    app-service /Common/Exchange_2010_Prod.app/Exchange_2010_Prod
    defaults-from /Common/http
    description none
    destination :
    interval 10
    manual-resume disabled
    partition Common
    password none
    recv none
    recv-disable none
    reverse disabled
    send "GET / HTTP/1.1
Host: mail.cajadeande.fi.cr
Connection: Close

"
    time-until-up 0
    timeout 31
    transparent disabled
    up-interval 0
    username none
}&nbsp;
But we change for a default HTTP monitor and the problem persist.&nbsp;
On the other hand, the Pasive box works perfect. We executed a force to standby on the active box and it failed over correctly to the other box and all of the services started to work normally. Then we discard network problems.&nbsp;

Forum Discussion

HTTP monitor_ Redundant pair member LTM Marks Resources as unavailable

2 Replies

Recent Discussions

F5Access | MacOS Sonoma

Active- Active HA setup

syslog server connection

Cannot see floating MAC address outside of ESXi

Rewrite uri translation not working

Related Content

DevCentral Resources

F5 Resources for COVID-19

BIG-IP Terraform Resources

Proxy SSL unavailable suite (47) issue

GTM Redundant pair Listener IP address