Server monitoring - some differences between BIG-IP and NGINX

Introduction

After years of working with BIG-IP, one might think there is only one way to check the status of servers behind a load-balancer. Some recently attended training on NGINX Plus has demonstrated otherwise.

Comparison

The BIG-IP uses active health checks using one or more Monitors assigned to either a Pool of servers, specific Members of the Pool, or the Nodes themselves. This then causes the BIG-IP to regularly communicate with the servers to see if they are up and available, according to the configuration of the Monitor. By default the BIG-IP does not appear to offer support for determining the status of a Server based solely on the result of a connection attempt. So it would not seem the BIG-IP can mark a server "unavailable" if there is not response to a connection attempt. Instead the BIG-IP will persistently attempt to load-balance connections to a Pool Member until a Monitor marks it Down.

Conversely, the default NGINX health check method is passive, based on the success or failure of the connection attempt. If the connection succeeds, NGINX continues to load-balance connections to the server. If the connection fails a specified number of times in a specified period of time, NGINX stops load balancing connections to the server for the specified period of time. Of course some fairly simple configuration is required. But once configured, NGINX passive health check system works quite well. (NGINX Plus also offers active health checks, which operate similarly to the BIG-IP method)

Paradigm Shift

The concept of passive health checks is something the BIG-IP can do. An iRule can be configured to monitor the results of connection attempts to the Pool Members behind a Virtual Server, then make decisions on whether or not to use the Pool Member for additional load balancing decisions.

It is the simplicity with which NGINX has deployed this feature that makes it so intriguing and easy to use. Simply including a bit if extra syntax in the "upstream" context, and NGINX will temporarily ignore a server for a specified period of time.

Take the following context example:

upstream myServers {
   	server (server:port) max_fails=2 fail_timeout=30s;
   	server (server:port) max_fails=2 fail_timeout=30s;
	...
}

With this, if the connection to a server fails twice in 30 seconds, NGINX will ignore that server and load-balance connections to only the remaining servers in the list. Once the 30 seconds has elapsed, NGINX attempts to forward a Client's connection to the server. If the server responds, then all is well. If connection fails again twice in 30 seconds, NGINX ignores the server for 30 seconds, and the cycle repeats.

Take-Away

Learning that a load-balancer could simply stop load-balancing to a server if the connection fails was quite a change in understanding, and is one of the things I like best about NGINX as a load-balancer.

Published Mar 09, 2020

Version 1.0

application delivery

BIG-IP

NGINX Plus