Forum Discussion

Luis_Ribeiro's avatar
Luis_Ribeiro
Icon for Nimbostratus rankNimbostratus
Jun 19, 2018

ltm nodes down ICMP monitor fail, but ping from TMSH get success!

Hi, I have a two viprion chassi (BIG-IP C4480) with 4 vcmp, version 12.1.2. One vcmp suddenly get all nodes down and the reason is the fail of ICMP Health Monitors. If I change the probe (Health Monitors), in the node specific, to TCP the node get up! If I do a ping at tmsh CLI I get success. So the the node is UP and reachable, from the BIG-IP vcmp.

To continue the despite the problem I see at the node the reason for the probe fail, which is:

Offline (Enabled) /Common/icmp: sendto(): Bad file descriptor; No successful responses received before deadline. @2018/06/19 11:26:04.

What is the meaning of

Bad file descriptor
in this context?

I get something like this in the previous version (not in the node), when I enable monitor logging at the node, but now there are no check on the monitor logging. One reason for the upgrade was to fix this issue.

At the host chassi I issue the command

dmesg
and I see the following messages:

SELinux: initialized (dev sda1, type ext2), uses xattr
linux-kernel-bde 0000:17:00.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
linux-kernel-bde 0000:19:00.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.EXT2-fs warning: maximal mount count reached, running e2fsck is recommended

Jun 19 03:11:06 slot1/LB01A notice pendsect[12368]: pendsect: /dev/sda no Pending Sectors detected

Should I run

e2fsck
?

It's not the first time, this one has worst (2 of the 4 vcmp) all nodes! What is the reason for this? What can I do to fix, if it happen again?

I appreciate your comments.

Kind Regards, LFR

3 Replies

  • "What is the meaning of Bad file descriptor in this context?"

     

    In your case, it probably means that the bigd was trying to write to a file that has already been closed. Maybe because the interval settings you have or a software bug.

     

    Can you post here the icmp monitor settings you have?

     

    There is this bug, but does not apply to the version you have:

     

    https://support.f5.com/csp/article/K48693281

     

    "Should I run e2fsck?"

     

    The physical disk is in the vCMP host, the vCMP guests only have virtual disks that a basically files in the vCMP host disk. If there was a problem with the disk, would probably affect all vCMP guests, and within the guests, not only the bigd process.

     

    Anyway, you don't lose anything in checking that.

     

    You can use the platform diagnostics for that, that is more user-friendly version:

     

    https://support.f5.com/csp/article/K15442

     

    However, there is also the smartctl command. Don't forget that you test the disk in the vCMP host.

     

  • It's not the first time, this one has worst (2 of the 4 vcmp) all nodes! What is the reason for this? What can I do to fix, if it happen again?

    contact F5 support NOW! issues like this most likely are due to hardware issues, you want F5 support to guide you to the correct diagnostic steps and if needed initiate a hardware replacement as soon as possible.

  • Hi,

    I open a case at F5 support and I need to do upgrade.

    There are similar bugs to my problem which has a temporary workaround. 2 most popular problems related to this error message:

    https://cdn.f5.com/product/bugtracker/ID681499.html

    https://cdn.f5.com/product/bugtracker/ID620079.html

    In my case the problem is similar to bug:

    Bug ID 620079: Removing route-domain may cause monitors to fail.

    Workaround describe for this bug, is working in my case:

    bigstart restart bigd
    

    ICMP starts work - and we do not have any impact on traffic process. This only restarts process responsible for monitoring.