Forum Discussion

Jeffrey_Morrow's avatar
Jeffrey_Morrow
Icon for Nimbostratus rankNimbostratus
Jun 05, 2018

Do pool servers go down during backups ?

I've noticed in the F5 logs that servers show monitoring status as down at weird times of the early morning. Could it be that during times of backups, they fail health-checks? And its not all of the 170 servers that this happens?

 

8 Replies

  • I do not think that should be the case. If you have enabled default ping monitoring as a means of health check, that should not get disturbed while any of the activity, including backup. Does any of your monitoring tool also report those servers as unavailable ?

     

  • Devices I am seeing are mostly HTTP health checks, with a small sample size of TCP. In our test/dev area, I get the same results if a software engineer may have been working on the server without taking it out of the pool. However, the ones in our Production area, the timestamp is from very early in the morning, when no one is working on the server(s).

     

  • this issue is not related to F-5.

     

    i have an question :-

     

    +is issue occurred at same time. if yes then check the tcpdump for that duration .

     

    +check if there is any issue on log server.

     

    +check if there is any patch update on servers/log server.

     

    but for sure this is definitely not related to F-5 scheduled backup.

     

    • Jeffrey_Morrow's avatar
      Jeffrey_Morrow
      Icon for Nimbostratus rankNimbostratus

      I am certain this is NOT an issue with the F5. These pool member up/down log entries are around 2-3am. I'm dedicated, but not that dedicated so early in the AM....the root of my question was to see if this may be a normal result, if system backups could have interrupted enough health-checks.

       

  • Health check traffic should not get interrupted during the backup of the servers. However, F5 might mark down the server. If, you are using a custom monitor (HTTP,HTTPS or EAV) and it fails to get the "recv string" within the specified time out value because of below reasons:

     

    1. Backup is choking the bandwidth utilization.
    2. High CPU/Disk utilization on the servers.
    • Jeffrey_Morrow's avatar
      Jeffrey_Morrow
      Icon for Nimbostratus rankNimbostratus

      Thanks. I'm having our systems people checking and comparing timestamps of the alerts, with when the backups were performed.

       

  • How aggressive are the times on the HTTP health checks, it could be that they are responding slowly i.e. 17 seconds?