Load balancing Windows Terminal Server — HAProxy and RDP Cookies or Microsoft Connection Broker

Open source Published on 7 mins Last updated

When you have users depending on Windows Terminal Services for their main desktop, it's a good idea to have more than one Terminal Server. RDP, however, is not an easy protocol to load balance; sessions are long-lived and need to be persistent to a particular server, and users may connect from different source addresses during one session.

The current development version of HAProxy has made an important step forward in making this possible. Thanks to work by Exceliance, it now supports RDP Cookies, offering a solution to the persistence problem.

We have been testing the latest development release of HAProxy, 1.4-dev4, on a Loadbalancer.org Enterprise R16 device. The real servers were two Windows Server 2008 machines, with identical test users set up on both.

defaults
clitimeout 1h
srvtimeout 1h
listen VIP1 192.168.0.10:3389
mode tcp
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
persist rdp-cookie
balance rdp-cookie
option tcpka
option tcplog
server Win2k8-1 192.168.0.11:3389 weight 1 check inter 2000 rise 2 fall 3
server Win2k8-2 192.168.0.12:3389 weight 1 check inter 2000 rise 2 fall 3
option redispatch

Note that this is only a fragment of the haproxy.cfg file, showing the relevant options.

The load balancer's Virtual IP is set to 192.168.0.10, listening on port 3389 for RDP. The two real servers are on 192.168.0.11 and 192.168.0.12, in the same subnet as the Virtual IP.

The two new configuration directives are

persist rdp-cookie

and

balance rdp-cookie

These instruct HAProxy to inspect the incoming RDP connection for a cookie; if one is found, it is used to persistently direct the connection to the correct real server. The two tcp-request lines help to ensure that HAProxy sees the cookie on the initial request.

The only other tweak needed is to increase the clitimeout and srvtimeout values to one hour. In testing, this was found to be necessary to keep idle RDP sessions established.

Testing involved making multiple connections with different usernames, from varying IP addresses, using both Windows XP Professional and Linux clients. Sessions were disconnected and reconnected, and real servers removed from the cluster and re-inserted.

We found that, once a user had established a session with a particular real server, that user consistently reconnected to the correct server if it was available. When we removed and re-inserted servers, existing sessions were unaffected. After a simulated server failure, users could start a session on the remaining server.

When a failed server was brought back on-line, users that had been connected to that server would reconnect to it again - even if they had started a new session on the other server in the meantime. This may not be what you want, and requires further testing.

With client and server time-outs set to one hour, we were able to leave idle sessions running for 16 hours without problems.

For more information on the new configuration options, see the development version of HAProxy's Configuration Manual.

NB. For some daft reason Microsoft restricted the login cookie in RDP to 9 characters! Now as the domain is usually listed first (mydomain/myusername) the first 9 characters may always be the same and RDP cookie session persistence wont work. Two work arounds for this are either reduce the length of your domain name (ouch!) OR use the myusername@mydomain format when you log in....

So what about Microsoft Connection Broker (session directory or whatever they call it)?

A simple one line change in your HAProxy configuration (RDP Connection Broker):

#Balance rdp-cookie and balance leastconn i.e.
defaults
clitimeout 1h
srvtimeout 1h
listen VIP1 192.168.0.10:3389
mode tcp
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
persist rdp-cookie
balance leastconn
option tcpka
option tcplog
server Win2k8-1 192.168.0.11:3389 weight 1 check inter 2000 rise 2 fall 3
server Win2k8-2 192.168.0.12:3389 weight 1 check inter 2000 rise 2 fall 3
option redispatch

Note that this is only a fragment of the haproxy.cfg file, showing the relevant options.

Its about time we updated this post for the juicy new features in HAProxy - Development 1.5-dev7

Their were a couple of the problems with the hash method used with RDP cookie load balancing (as described above):

  1. Lots of people would like to use least connection load balancing with WTS/RDP clusters (this is not possible with a HASH based persistence method).
  2. When you add or remove servers the HASh table gets re-configured i.e. users hit the wrong server.

So Loadbalancer.org took the decission to sponsor the development of a stick-table based RDP persistence (we sponsored the origional source IP stick table work as well). When we looked at it in more detail we decided that what we needed was:

  1. Flexible stick tables that could be used for multiple future requirements i.e. SSL Session ID persistence.
  2. RDP stick table support in order to enable least connection based scheduling.
  3. Some way of restoring stick tables on session restart (and also replication to other HAProxy instances).
  4. Ensuring that TCP connections are properly closed on server failure (especially important on long connections).
  5. Ensuring that the stick table is cleared out on server failure.
  6. And finaly making sure that the fallback server can be made non-sticky! (really irritating if you get stuck on the sorry site down page).

To cut a long story short lets just dive in with a full configuration file and explain it as we go:

#HAProxy configuration file generated by LB Cloud appliance
global
#uid 99
#gid 99
daemon
stats socket /var/run/haproxy.stat mode 600 level admin
log 127.0.0.1 local4
maxconn 40000
ulimit-n 81001
pidfile /var/run/haproxy.pid
defaults
log global
mode http
timeout connect 4000
timeout client 42000
timeout server 43000
balance roundrobin
peers localpeer
peer loadbalancer localhost:8888
listen stats :7777
stats enable
stats uri /
stats hide-version
option httpclose
frontend F1
bind *:3389
maxconn 40000
default_backend B1
mode tcp
option tcplog
backend B1
mode tcp
option tcpka
balance leastconn
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
persist rdp-cookie
stick-table type string size 204800 expire 120m
stick on rdp_cookie(mstshash)
server R1 www.loadbalancer.org:3389 weight 1 check port 3389 inter 2000 rise 2 fall 3 on-marked-down shutdown-sessions
server R2 www.clusterscale.com:3389 weight 1 check port 3389 inter 2000 rise 2 fall 3 on-marked-down shutdown-sessions
server backup us.loadbalancer.org backup non-stick
option redispatch
option abortonclose

An important new section is the peers section:

peers localpeer
peer loadbalancer localhost:8888

In this configuration we are syncronising all of the stick table information with localhost:8888 (it could be with another HAProxy instance for session table high-availability).
When HAProxy restarts it will run existing sessions on the old process until they expire, only new sessions will run on the new HAProxy instance (this can get quite confusing as the stats socket or page will only show the new sessions (not the old ones)
You will need to change your HAProxy start up scripts:

start() {
/usr/local/sbin/$BASENAME -L loadbalancer -c -q -f /etc/$BASENAME/$BASENAME.cfg
if [ $? -ne 0 ]; then
echo "Errors found in configuration file."
return 1
fi
echo -n "Starting $BASENAME: "
daemon /usr/local/sbin/$BASENAME -D -f /etc/$BASENAME/$BASENAME.cfg -p /var/run/$BASENAME.pid -L loadbalancer
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/$BASENAME
return $RETVAL
}
reload() {
/usr/local/sbin/$BASENAME -L loadbalancer -c -q -f /etc/$BASENAME/$BASENAME.cfg
if [ $? -ne 0 ]; then
echo "Errors found in configuration file."
return 1
fi
/usr/local/sbin/$BASENAME -D -L loadbalancer -f /etc/$BASENAME/$BASENAME.cfg -p /var/run/$BASENAME.pid -sf $(cat /var/run/$BASENAME.pid)
}

The important thing is that the peers definition "loadbalancer" must be prsent in both the start up scripts and the haproxy.cfg file.

Now we have the new section to make the stick table use RDP cookies and the least connection scheduler:

balance leastconn
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
persist rdp-cookie
stick-table type string size 204800 expire 120m
stick on rdp_cookie(mstshash)

And the new clean and quick session kill options + making the backup server not go in the stick table:

server R2 www.clusterscale.com:3389 weight 1 check port 3389 inter 2000 rise 2 fall 3 on-marked-down shutdown-sessions
server backup us.loadbalancer.org backup non-stick

I probably haven't explained all that very well... but these tweeks ensure that servers that fail health checks immediately break the long held TCP connections

but feel free to ask questions :-).

Someone asked for a complete configuration file , so here goes:

# HAProxy configuration file generated by loadbalancer.org appliance
global
daemon
stats socket /var/run/haproxy.stat mode 600 level admin
pidfile /var/run/haproxy.pid
maxconn 40000
ulimit-n 81000
tune.maxrewrite 1024
defaults
mode http
balance roundrobin
timeout connect 4000
timeout client 42000
timeout server 43000
peers loadbalancer_replication
peer lbmaster localhost:7778
peer lbslave localhost:7778
listen RDP_Test
bind 192.168.67.30:3389
mode tcp
balance leastconn
server backup 127.0.0.1:9081 backup non-stick
option tcpka
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
stick-table type string size 10240k expire 12h peers loadbalancer_replication
stick on rdp_cookie(mstshash) upper
timeout client 12h
timeout server 12h
option redispatch
option abortonclose
maxconn 40000
server 2008_R2 192.168.64.50:3389 weight 1 check inter 2000 rise 2 fall 3 minconn 0 maxconn 0 on-marked-down shutdown-sessions
listen stats :7777
stats enable
stats uri /
option httpclose
stats auth loadbalancer:loadbalancer

Note some small changes in the timeouts & stick table section:

stick-table type string size 10240k expire 12h peers loadbalancer_replication stick on rdp_cookie(mstshash) upper

Adding Terminal Server Gateway into the mix!

Recently we have found that when you add a TS Gateway server into the mix it seems to effect session persistence with users not being reconnected to their existing sessions. It turns out that the problem is with the way the connections are made leaving the Gateway and not sending the correct RDP session cookies. the fix is actually quite simple. Just add the following to your configuration!

tcp-request content reject if { req_ssl_hello_type 1 }

So the configuration above would become the following.

#HAProxy configuration file generated by loadbalancer.org appliance
global
daemon
stats socket /var/run/haproxy.stat mode 600 level admin
pidfile /var/run/haproxy.pid
maxconn 40000
ulimit-n 81000
tune.maxrewrite 1024
defaults
mode http
balance roundrobin
timeout connect 4000
timeout client 42000
timeout server 43000
peers loadbalancer_replication
peer lbmaster localhost:7778
peer lbslave localhost:7778
listen RDP_Test
bind 192.168.67.30:3389
mode tcp
balance leastconn
server backup 127.0.0.1:9081 backup non-stick
option tcpka
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
tcp-request content reject if { req_ssl_hello_type 1 }
stick-table type string size 10240k expire 12h peers loadbalancer_replication
stick on rdp_cookie(mstshash) upper
timeout client 12h
timeout server 12h
option redispatch
option abortonclose
maxconn 40000
server 2008_R2 192.168.64.50:3389 weight 1 check inter 2000 rise 2 fall 3 minconn 0 maxconn 0 on-marked-down shutdown-sessions
listen stats :7777
stats enable
stats uri /
option httpclose
stats auth loadbalancer:loadbalancer

Don't forget Microsoft broke cookie support in terminal services.... Read the post here!