Forum Discussion

flomkrl_29950's avatar
flomkrl_29950
Icon for Nimbostratus rankNimbostratus
Dec 18, 2007

monitor statistics

Hello ,

 

 

I have a node which go down and up everytime. The timeout of the http monitor is not set properly.

 

 

I want to know exactly what is the time obtained by the monitor health check. It do not appeared on stat and command line.

 

 

Thanks in advance for your help;

 

 

Regards,

 

 

Flo;

 

 

For the moment i will use a script to do that :

 

 

(time curl http://IP1/test.html ) 2> /tmp/node1.txt

 

(time curl http://IP2/test.html ) 2> /tmp/node2.txt

 

node1=`cat /tmp/node1.txt | grep real | cut -d "m" -f 2 | cut -d "s" -f 1`

 

node2=`cat /tmp/node2.txt | grep real | cut -d "m" -f 2 | cut -d "s" -f 1`

 

tm=`date`

 

echo "$tm,$node1,$node2" >> /tmp/statnode.txt

 

 

 

 

4 Replies

  • How long do the pool members take to answer requests to /test.html? What is the interval and timeout on your HTTP monitor set to?

     

     

    Ideally, you should set the interval to a value which is greater than the length of time it takes the pool member to respond to the monitor request. The timeout should be three times the interval + 1 second. Assuming the pool member responds in less than 5 seconds, the default HTTP monitor setting of 5 seconds for the interval and 16 seconds for the timeout should keep the pool member marked up (as long as the pool member is responding).

     

     

    Aaron
  • Hello,

     

     

    The pool member can answer sometimes to 30 seconds (maybe i have to set timeout to 1 minute);

     

     

    In fact, i was just wondering what was the time measure by the monitor when a node become down.

     

    It permits to adjust the timeout value. I set the email alert on alertd.cond and got just ip:port monitor status down and do not know the exact reason.

     

     

    Have youe got any idea about monitor down details logs ?

     

     

    Debug mode is to much verbose for me.

     

     

    Thanks in advance,

     

     

    Flo,
  • Hi Flo,

     

     

    The pool member down message will be logged when the pool member does not respond successfully to at least one monitor request during the timeout period.

     

     

    Here is an explanation of the interval and timeout:

     

     

     

    interval: how often in seconds to send a request

     

     

    timeout: how long to wait for a successful response before marking the member down

     

     

    By default, these values are set to interval and timeout of 5 sec and 16 seconds (timeout = 3 x interval + 1). So the monitoring daemon starts a timer equal to the value of the timeout. It will send a request every five seconds--the length of the interval. When it receives a successful response, the countdown is reset to the timeout value. In this scenario, the node basically has three chances to respond before being marked down. Requests are still sent every interval even when the node is marked down. This allows for automatic resumption of use of the node when it responds correctly to a request. If you want to keep the node marked down even after it responds again, you can enable 'manual resume'.

     

     

    If you want to give the node more chances to respond before marking it down, you could extend the timeout length. Setting the interval and timeout to 5 and 31 would mean the node would get sent six requests before being marked down.

     

     

     

     

    Aaron