Forum Discussion

Andrej_Krnac's avatar
Andrej_Krnac
Icon for Nimbostratus rankNimbostratus
Jul 31, 2019

healthiness of F5 - help with errors

Dear Gents,

 

I am reviewing /var/log/ltm logs and KPI statistics on two high available F5s running on release 12.1.3. I am looking for root cause of high memory or CPU usages. There are several problems bellow.

 

Q1 First of all other memory utilization is hitting 70 % (tmm memory just 20%). Also CPU usage is hitting over 70% in peaks. Is there any hint how or where to start high memory or CPU usage issue troubleshooting? F5 is used as loadbalancer. There is very low traffic flowing across F5s around 50 Mbps.

Q2 There is seen in ltm logs following error: mcpd[7118]: Sync of device group /Common/device-group-failover-2530624eebc7 to commit id 6666 Is it something on which keep eye?

Q3 There is error with unreachable pool member and failing monitor ( err f5-A tmm[12726]: No members available for pool /poolX/pool_A

ltm 07-31 05:43:50 notice f5-A mcpd[7118]: Pool /part_X/pool_A member /part_www/10.XX.YY.ZZ:80 monitor status down). I can fix it easily but can such unreachable pool member use high CPU or Memory usage?

 

Any advice helping troubleshooting errors above would be highly appreciated. Many thanks

 

Regards

 

Andy

1 Reply

  • Hi Andy,

     

    I believe I can answer some of your questions.

     

    Q1: The "other memory" is basically just memory being used by Linux processes and other non-TMM related things. It is fairly typical to see it's usage around 70%. So unless you see it steadily increasing, you shouldn't need to worry about it. We also have an article that talks a bit more about memory.

     

    K16419: Overview of BIG-IP memory usage

    https://support.f5.com/csp/article/K16419

     

    As far as high CPU usage is concerned, that can depend on how many modules you have provisioned and what kind of traffic is being passed. If you are concerned with it then I would recommend opening a ticket with support just to have someone take a closer look at a qkview from the device to see if the usage appears to be normal or not.

     

    Q2: Is that the full log as it appears in /var/log/ltm? If it isn't then please share the full log. Based on what you included it looks like a configsync of the "device-group-failover-2530624eebc7" occurred but it doesn't seem to say anything more. Are there any other logs around that time related to a sync that took place? Typically we would either see in the logs that the sync failed or that it was successful. It may be worth reviewing the logs of the peer device at that same timestamp as well to see if it logged anything about the sync between the two peers.

     

    Q3: The health monitor will use a small amount of both memory and CPU but I wouldn't expect any noticeable difference in usage from a pool member being unreachable. At any given time the monitor should simply be checking to see if it can reach the pool member and if it can it will send it's health checks at the interval that you have configured within the monitor. That shouldn't cause the CPU or memory to spike up.

     

    -Nathan F