How to write an external health check script for HAProxy

How to write an external health check script for HAProxy

HAProxy Published on 10 mins Last updated

Health checks are an important part of load balancing your application, and in many other circumstances too. We are often asked to write custom checks, and of course we always go above and beyond to provide the most simple, most complete check we can come up with — no matter the application.

But if you can write your own custom health check for one of our appliances, that's an invaluable tool you can use time and time again.

In this comprehensive guide, I’ll dive deep into utilizing external health check scripts for HAProxy, including how to write health checks, how to troubleshoot those scripts, and how to configure HAProxy to take advantage of this functionality.

Loadbalancer.org were the original sponsors of the external health check mechanism in HAProxy. We think it's an invaluable tool when you need something a bit special.We also wanted to make sure that the external health check in  HAProxy was compatible with Ldirectord used by us for Layer 4 load balancing with Linux Virtual Server (LVS).

Table of contents

What are health checks?

Health checks are automated tests used to monitor whether or not the servers and associated services in the pool are operational. They provide one of the most crucial features on any load balancer — the ability to send traffic to healthy nodes that can provide request data to clients.

If an application server fails their health check, it is removed from the pool until it passes those checks. This means the load balancer stops directing traffic to that server and reroutes it to other operational, healthy servers.

External health check scripts are custom little programs that can be used to extend HAProxy’s built-in health check mechanisms. These are executed on the load balancer to probe the remote server, often enabling a deeper, more customized form of monitoring beyond simple checks like HTTP status codes or checking if the port is open.

Why use external health check scripts?

While HAProxy supports a variety of protocol checks (such as HTTP, HTTPS, SMTP or MySQL) there are instances where a more robust solution is needed. This is where external health check scripts come into play.

Here are a few reasons I can think of why using external health check scripts might be beneficial:

  1. Advanced monitoring: External scripts allow you to define more complex conditions. For example, you can create scripts that verify specific application states, custom metrics, or check multiple endpoints.
  2. Database or Service health checks: For environments with multiple services such as databases, caches, or message queues, external scripts can check the health of these services in addition to the web servers.
  3. Environment-specific monitoring: You may need to monitor application-specific states that require custom scripts, such as checking the availability of a third-party API or performing load testing under controlled conditions.
  4. Centralized monitoring: External scripts can be used to collect and centralize health check data from various services, offering a more holistic view of the infrastructure’s health.

Using external health check scripts in HAProxy

Utilizing external health check scripts in HAProxy involves several key steps, each of which ensures that your system remains reliable by properly evaluating the state of your backend services.

Here are the steps.

Step 1: Create the external health check script

First, you'll need to create a custom script that can perform the desired checks. It can be written in any scripting language you prefer (e.g., Bash, Python, Perl), bearing in mind that the interpreter of said language is installed on your load balancer ; ).

Let’s start with a simple Bash “health check”. Bash is a great tool available out-of-box in nearly every Linux distribution:

#!/bin/bash

exit 0

The above simply outputs exit code 0, which in the context of such a check means it’s “healthy”. All the applications in the POSIX environments follow this logic. An exit code that returns anything other than 0 means that something went wrong and usually the particular number given allows us to understand the error.

While the above check is rather useless as it permanently marks all servers in the pool as being up, it is important in helping us understand how HAProxy determines whether the check was passed or not.

Step 2: Configure HAProxy to use your new script

Next, I’ll explore how to configure HAProxy to use this first script:

Edit your haproxy.cfg to include the below configuration:

    option external-check
    external-check path "/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/bin:/usr/sbin"
    external-check command /etc/haproxy/checks/exit0.sh

The above should be added to either the listen or backend section of your configuration, depending on how you defined your virtual service. You must also remember that the directive check is present on each server line.

This snippet allows the use of external checks, sets the PATH environment variable and specifies which script will be executed.

For example:

 server srv1 192.168.2.21:443 weight 100 check
 server srv2 192.168.2.22:443 weight 100 check

Step 3: Configure health check settings

Now, you need to configure how the load balancer will interact with the script. This typically includes the following parameters:

  1. Interval: Defines how often the health check will be run (e.g., every 30 seconds).
  2. Timeout: Sets the time before the health check is considered to have failed.
  3. Retries: Defines how many consecutive failures are required before a server is marked as unhealthy.

For example, if you want to check the service every 30 seconds with a timeout of 5 seconds and allow up to 3 retries before marking it as “unhealthy”, the configuration might look like this:

  • Interval: 30 seconds
  • Timeout: 5 seconds
  • Retries: 3
server srv1 192.168.86.21:443 weight 100 check inter 30s  rise 3 fall 3  

Step 4: Test and validate the health check

After configuring the external health check script, it's important to test and validate its functionality. You can manually trigger the health check or wait for the next scheduled check and observe the results.

Ensure that:

  • The script is executed successfully.
  • The correct exit status (0 or 1) is returned based on the state of the service.
  • The load balancer responds accordingly by rerouting traffic to healthy servers.

Real life external health check scripts in HAProxy

Here are a few real-life examples of external health checks in HAProxy for you to copy and paste.

Multiple URLs health check

Below is an example of a script that checks multiple URLs on the server.

If all of them respond as expected the server is marked as “healthy”. For each URL that doesn’t respond as expected, we are adding a 1 to the exit code, which can help us investigate issues:

#!/bin/bash

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/root/

################################################################################
#                 	Checking multiple URLs for a response                  	#
#                                                                          	#
#                          	Loadbalancer.org                            	#
################################################################################

# Variables passed by ldirectord/haproxy.
# $1 = VIP Address
# $2 = VIP Port
# $3 = Real Server IP
# $4 = Real Server Port

# Reassigning to named variables
server="${3}"
port="${4}"

################################################################################
#                      	Configuration begins                            	#
################################################################################

proto='https'                	# Protocol HTTP or HTTPS
timeout='5'                    	# Timeout in seconds
expected='running'            	# String expected in response

# An array of URLs to check, it can be easily extended

urls=('/serviceCheck' '/api/apiCheck')

################################################################################
#                        	Configuration ends                            	#
################################################################################

# Define the check function
check() {
	curl -I -m "${timeout}" "${proto}://${server}:${port}/${1}" |
    	grep "${expected}"
}

# Derfine the main function
main() {
	for url in "${urls[@]}"; do
    	check "${url}" ||
        	fail_count=$((fail_count + 1)) # Each failed check adds 1 to counter
	done

	exit "${fail_count}" 	# The exit code will be equal to number of failures
}

# Call the main function
main

And here is the same health check written in Python:

#!/usr/bin/env python3

import sys
import requests

################################################################################
#                        	Checking multiple URLs for a response          	#
#                          	Loadbalancer.org                            	#
################################################################################

# Variables passed by ldirectord/haproxy.
# sys.argv[1] = VIP Address
# sys.argv[2] = VIP Port
# sys.argv[3] = Real Server IP
# sys.argv[4] = Real Server Port

# Reassigning to named variables
server = sys.argv[3]
port = sys.argv[4]

################################################################################
#                      	Configuration begins                            	#
################################################################################

proto = 'https'                    	# Protocol HTTP or HTTPS
timeout = 5                     	# Timeout in seconds
expected = 'running'            	# String expected in response

# An array of URLs to check, it can be easily extended

urls = ['/serviceCheck', '/api/apiCheck']

################################################################################
#                        	Configuration ends                            	#
################################################################################

# Define the check function
def check(u):
	rsp = requests.get(f'{proto}://{server}:{port}/{u}', timeout=5)
	return rsp.text.find(expected)

# Define the main function
def main():
	fail_count = 0

	for url in urls:
    	if check(url) == -1:
        	fail_count += 1 # Each failed check adds 1 to counter

	return fail_count 	# The exit code will be equal to number of failures

# Call the main function
main()

Samba health check

Here is a script that checks if a share of your choice is available on a Samba server.

This is a good way of confirming that your Samba servers are operational:

#!/bin/bash

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/root/

################################################################################
# 	Samba health check, checking for a specific share being avaialable   	#
#                                                                          	#
#                          	Loadbalancer.org                            	#
################################################################################

# Variables passed by ldirectord/haproxy.
# $1 = VIP Address
# $2 = VIP Port
# $3 = Real Server IP
# $4 = Real Server Port

server="${3}"

################################################################################
#                      	Configuration begins                            	#
################################################################################

user='username'       	# Username
hash='pw-nt-hash'     	# NTLM hash of your password
proto='SMB3'          	# Samba protocol version to use
wgroup='lb-org'       	# Workgroup
share='test_share'    	# Path of the share to check the availability

################################################################################
#                        	Configuration ends                            	#
################################################################################

# Define the check function
check() {
  smbclient "//${1}//{2}" -W "${3}" -m "${4}" -U "${5}" --pw-nt-hash "${6}" -c 'ls' |
	grep --silent -v 'NT_STATUS_PATH_NOT_COVERED'
}

# Define the main function
main() {
  check "${server}" "${share}" "${wgroup}" "${proto}" "${user}" "${hash}" || exit 1
}

# Call the main function
main "$@"

Adding environment variables

HAProxy also allows you to use some environment variables to access details about the servers you are checking.

Here’s a list of environment variables you can add to your health check:

HAPROXY_PROXY_ADDR

The initial bind address, if available. This will be empty if not applicable, as in a "backend" section.

HAPROXY_PROXY_ID

The identification number assigned to the backend.

HAPROXY_PROXY_NAME

The name of the backend.

HAPROXY_PROXY_PORT

The initial bind port, if accessible (or left blank if not relevant, for instance, in a "backend" section or for a UNIX socket).

HAPROXY_SERVER_ADDR

The server address

HAPROXY_SERVER_CURCONN

The current number of connections on the server.

HAPROXY_SERVER_ID

The identification number assigned to the server.

HAPROXY_SERVER_MAXCONN

The maximum number of connections allowed on the server.

HAPROXY_SERVER_NAME

The name of the server.

HAPROXY_SERVER_PORT

The server port (if available) or an empty string for a UNIX socket.

HAPROXY_SERVER_SSL

The value is "0" when SSL is not used and "1" when SSL is used.

HAPROXY_SERVER_PROTO

The server utilizes a specific protocol, which can be chosen from the following options: "cli" (the haproxy CLI), "syslog" (syslog TCP server), "peers" (peers TCP server), "h1" (HTTP/1.x server), "h2" (HTTP/2 server), or "tcp" (any other TCP server).

External health check script best practice

While setting up your external health check scripts, try to consider the following:

  1. Minimize resource consumption: Keep scripts to the bare minimum. Health checks should run quickly and consume minimal resources to avoid overloading the system.
  2. Ensure idempotency: Your scripts should return consistent results even if run multiple times. This ensures predictable behaviour, especially when diagnosing problems.
  3. Use logging: It’s a good idea to incorporate logging within your scripts to track their execution and results. This will help debug any issues that may arise.
  4. Test your scripts in a staging environment: Before deploying scripts to production, thoroughly test them in a non-production environment to ensure they work as expected!
  5. Security: If your scripts require access to sensitive data or services, ensure they are secured properly with appropriate permissions.

Troubleshooting external health check scripts

Let’s be honest, even after careful setup, issues may arise with your external health check scripts.

Here are some common troubleshooting tips that I’ve found useful over the years:

  1. Check script permissions: Ensure that the script has the correct execute permissions. The load balancer needs to be able to run the script.
  2. Script failures: If the script is failing or returning incorrect results, check the log files (if implemented) for errors. Ensure the script is accessing the correct services and resources.
  3. Timeouts and delays: If your health checks are timing out, review the script for performance bottlenecks or increase the timeout values in the load balancer configuration.
  4. Check the health check logs: Loadbalancer.org offers logs of health check execution that can be used to determine whether the health check scripts are functioning as expected.

How to do this with a Loadbalancer appliance

Alternatively, if scripts aren’t your thing. You can run these same health checks using our load balancer appliance by doing the following:

  1. Go to the WebUI of your Loadbalancer appliance
  2. Click on "Cluster Configuration":


3. Click on "Health Check Scripts":

4. Click on "Add New Health Check":

5. Choose a template:

6. Change the name of your health check:

7. Click on "Update":

8. Click on "Layer 7 - Virtual Services":

9. Click on "Modify":

10. Set check type to External Script:

11. Choose your script from the list:

12. Click the "Update" button:

13. When prompted, click on "Reload HAProxy":

Conclusion

External health check scripts are an essential tool for maintaining high availability and resilience in your infrastructure, whether you’re using our load balancer or HAProxy itself.

They enable custom, granular monitoring tailored to your specific needs, ensuring your services are always healthy and operational. By following the steps outlined in this article, you can easily integrate external health checks into your day-to-day operations, enhancing the reliability of your infrastructure.

If you have any questions, please don’t hesitate to get in touch or drop a comment below.

And remember to continually test, monitor, and refine your health check scripts to ensure they remain effective as your environment evolves.

References

Want to try it for yourself?

Try it with a Loadbalancer.org appliance