Forum Discussion

Steve_130921's avatar
Steve_130921
Icon for Nimbostratus rankNimbostratus
Jan 16, 2014

Node or Pool monitoring, showing down for 9 seconds

I don't have a lot of content set up but of the few that we are working with I'm getting random alerts on nodes and pools about a node going down for 9 seconds. I have to assume it is being taken out of the pool but the constant email alerts we are forwarding to the customer about their server is not good if it is incorrect.

 

These seem to be an HTTP monitor, and while it's occurance is unpredictable it seems pretty standard on reporting down for 9 seconds. These same servers are previously configured on a Cisco ACE 4710 but I'm not getting the same alert from that system.

 

Any suggestions?

 

Steve

 

9 Replies

  • The logs that I have been given are relatively clean. The service did not go down when the alert was triggered. I'm getting this on several other servers as well. I do not have all their logs but it looks to me that the F5 could not contact the server for Health and subsequently marked the server as down. This is not looking good to my developers.

     

    I was thinking of modifying the monitor but I don't see anything wrong with it to begin with.

     

    Steve

     

  • Same monitor on all those Nodes and Pools? What's the monitor configuration please? What timers etc.

     

  • I've seen this two days in a row for the same server: Can't be a coincidence:

     

    [ was up for 23hrs:59mins:53sec ]

     

    I'm using the default http monitor as the parent in the iApp tool. It then builds a monitor based on that: The settings are:

     

    Interval: 30 seconds Up interval: disabled Time until up: 0 secomds timeout: 91 seconds Manual resume: no

     

    This is the monitor that I'm getting alerted on. The settings above are what it created. I still have strict updates enabled so I did not change the settings.

     

    I'm also using a custom monitor on the node: Which is the closest match to what we have on the ACE in our current production environment: (I'm not getting any alerts on the node monitor)

     

    Interval: 5 seconds Timeout: 16 seconds Everything else matches the http monitor

     

    It looks like the monitors the iApp builds are the ones giving me these false alerts while the monitor I put on the nodes are not alerting at all.

     

    Steve

     

  • My mistake. I am also getting alerts on the custom monitor on the nodes. (Email rules moved them). I'm going to change the monitor on the node to be an ICMP monitor and watch that one.

     

    Steve

     

  • Arie's avatar
    Arie
    Icon for Altostratus rankAltostratus

    Anything in the web server logs? (I'm referring to the actual web server logs (e.g. Apache, IIS), not the server logs. Primarily, what the response codes when the failure occurs?

     

  • Afternoon Steve

     

    Did you manage to resolve this issue?

     

    I'm asking as we are having the same issue.

     

    Many thanks

     

    Steve

     

  • The developers have given me a copy of their logs for the timeframes I get the alert and there's nothing to suggest they dropped service for port 80 or 443.

     

    And to answer the other question, no, I haven't been able to resolve it. It's a regular occurrence, random but a regular 9 seconds.