Perplexed by proxy?! Ruffled by reverse proxy?! We’ve all been there. Honestly, sometimes I think the language we use makes things sound way more complicated than they actually are...
Here’s a simple breakdown of what HAProxy reverse proxy actually means, and when and how you might use it in practice!
What is HAProxy reverse proxy?
Let’s break it down…
Proxy versus reverse proxy
A proxy is a server that sits between end-users and the web pages they visit online, effectively serving as a gateway between users and the internet. A common use is to control traffic and determine what things users can and cannot see.
Meanwhile, a reverse proxy is a server that usually sits behind the firewall in your private network, and points client requests to the relevant backend server. Using a reverse proxy can ensure a smooth flow of traffic between your clients and servers because it gives you greater control over your network traffic.
HAProxy is an example of a reverse proxy
HAProxy is an amazing free and open source load balancer that distributes requests across multiple servers for TCP and HTTP-based applications, improving speed and performance by distributing load across multiple servers. It is just ONE example of a reverse proxy.
Now, let’s put 'reverse proxy' and 'HAProxy' together...
HAProxy reverse proxy
Basically, “HAProxy reverse proxy” (or "reverse proxy HAProxy") is just a fancy way of describing “Layer 7 load balancing using an HAProxy load balancer”.
What makes it Layer 7 load balancing, you might ask? Answer: because Layer 4 load balancing is NOT a reverse proxy. Layer 4 is a routing-based load balancing method that works in a very different way. At Layer 4, the load balancer acts just like a firewall. It routes connections between servers and clients based on simple IP addresses and port information, combined with health checks.
Most commercial load balancer vendors including ourselves use Linux Virtual Server (LVS) for layer 4 functionality . Why? Because its amazingly powerful and its built in to the Linux kernel.
For more on the different load balancing techniques, check out this blog: Comparing Layer 4, Layer 7, and GSLB techniques.
So how do you actually configure a reverse proxy (Layer 7 load balancing, remember) in HAProxy? And when might you consider using it?
When to use HAProxy reverse proxy
Layer 7 load balancing has many advantages over its Layer 4 counterpart. Having said that, once you’ve got your head around the terminology, don’t be swayed by groupthink!
As anyone who follows Loadbalancer.org will know, we're not afraid to question the status quo. We would encourage you to do the same and weigh up the many pros and cons of each deployment method in order to determine the right one for YOUR use case.
For example:
The pros of Layer 7
- The main advantage of Layer 7 load balancing is that, because it is a reverse proxy, it's capable of retaining two TCP connections (one with the server, and one with the client), so it supports a broad range of protocols.
- Layer 7 is application-aware and supports communications for end-user processes and applications, and the presentation of data for user-facing software applications (e.g. web browsers, email communications). At Layer 7, the load balancer therefore has more information to make intelligent load balancing decisions, as information about upper-level protocols is also available.
- Layer 7 also has more flexibility, with the ability to add or alter headers, the ability to use ACLs to control or direct traffic based on defined criteria, and possibly, more importantly, several options for persistence.
- You can also implement rate control and quality of service + a huge amount of detailed analytics and application performance metrics.
The cons of Layer 7
- The main disadvantage of Layer 7 load balancing is that because it's a proxy process, it’s not as fast as Layer 4 load balancing (which, as we've said, works in a very different way). This is not really an issue unless the VIP has an extremely large volume of connections with a very high concurrency rate i.e. Netflix big...
- Layer 7 not being transparent by default could also be seen as a con. But we make it easy to use it in combination with TPROXY to resolve this issue.
- Another disadvantage of Layer 7 is that it doesn't support the good-old UDP protocol (although it does support the HTTP/3 protocol, which runs on QUIC; Quick UDP Internet Connections).
The pros of Layer 4
- Layer 4 is faster and transparent by default, so it might actually be a better fit for your needs if you require high throughput, network simplicity, and transparency.
- Layer 4 supports the UDP protocol.
- It usually requires alterations to the real servers — either by adding a loopback adapter and resolving the 'ARP issue' for Layer 4 DR or changing the default gateway for Layer 4 NAT.
- It can even be configured in SNAT mode, which magically makes it work like a reverse proxy i.e. no network changes (but not transparent).
The cons of Layer 4
- Layer 4 DR (the fastest option) requires the real servers to be directly connected to the same local network as the backend servers (ie. there are no routers in between). This is because Layer 4 DR relies on MAC address alteration to function.
- Layer 4 is not application-aware, so it's also not possible to intelligently route traffic based on the contents or alter the traffic while traversing the load balancer.
Remember, there are always two sides to every story, so always ask 'why'?! Just because Layer 4 isn’t a reverse proxy, that doesn’t mean it isn’t a superior solution in certain circumstances!
How to configure HAProxy reverse proxy
Now the easy bit! How to actually install HAProxy and configure Layer 7 load balancing!
For a step-by-step walkthrough, check out this blog: How to install and configure HAProxy on RHEL 7.
This blog also includes a rundown of how to do a Layer 7 HAProxy configuration, this time load balancing two IIS servers running on both ports 443 and 80.
Please let me know in the comments if this is helpful, or if you still have questions that I haven't answered here!