There are a lot of SSL offload throughput statistics available for appliances across the internet but rarely do they detail the way they were tested (probably because a lot of the numbers are inflated for marketing purposes). We at Loadbalancer.org would like to improve the standard across the industry by being transparent about how exactly we have tested our appliances for SSL performance:
What is SSL offloading/SSL Termination?
SSL offloading is the process of moving SSL traffic decryption and encryption away from your web servers onto a centralised device, be it a load balancer or specific SSL offloading hardware.
Why is SSL offloading/SSL Termination on the load balancer necessary?
Well its not really.... In fact Loadbalancer.org has always recommended that you use the application cluster for horizontally scaling your SSL. However SSL termination on the load balancer is definitely required if you need to use load balancer based cookie persistence instead of just source IP persistence + a few other things that you really should not be doing like directing packets based on contents of URLs etc. In fact SSL offloading ALWAYS doubles your load / halves your speed, after all you are going to re-encrypt to the backend aren't you? PCI-DSS compliance and all that?
The Test
You can use as many clients and back end servers as you need, to get the best results the loadbalancer/appliance should be the bottleneck. To generate the load on the test devices we wrote our own SSL test script in python. If you want a copy of it fire off an email to support@loadbalancer.org and they will be able to provide you with the latest version.
One IP address listening on port 443 using the decryption system of your choice, in our case its Stunnel. No SSL session reuse, although in the real world its a useful resource and should be implemented where possible. In this case it wont really give you a clear view of the raw horse power of the appliance. So each connection should involve a full SSL handshake and be fully closed on completion.
Then 1 internal IP listening on port 80 forwarding to HAProxy which is configured with no persistence. select a real server to direct the connection to based on weighted least connections. The backend servers should be running a web service either apache or nginx (in our case its apache) and returning a single html formatted webpage.
For Example -
<html>
<head>
<title>Server rip-test-1</title>
</head>
<body>
<p>You are viewing rip-test-1</p>
</body>
</html>
The whole page is 107B in size, this page is duplicated across all of the backend servers, with the only difference being rip-test-1 is changed to reflect the server name so we have
- rip-test-1
- rip-test-2
- rip-test-3
- etc.
Each test was run for a total of 60 seconds for a total of 3 runs, using HAProxy 1.5-dev18 version, Stunnel version 4.55 and OpenSSL 1.0.1e.
A complete connection counts as a HTTPS request for the page made from a client NOT running on the loadbalancer to the 443 IP on the loadbalancer, that request is decoded and passed on to HAProxy where it decides which server to pass the request on to. The request is received by the back-end and the page is returned to HAProxy where is passed onto the program that's doing the SSL to be re-encrypted and passed back to the client.
SSL (TPS) Terminations Per Second results for a selection of CPUs
Date of Test | CPU | RAM | Cipher | Certificate Length | Run 1 Results | Run 2 Results | Run 3 Results |
---|---|---|---|---|---|---|---|
30/10/13 | Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz | 4GB | ECDHE-RSA-AES256-SHA | 1024 bits | 3625 | 3629 | 3631 |
31/10/13 | Intel® Atom processor C2750 | 8GB | ECDHE-RSA-AES256-SHA | 1024 bits | 1147 | 1204 | 1220 |
04/11/13 | Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz | 8GB | ECDHE-RSA-AES256-SHA | 1024 bits | 2780 | 2778 | 2777 |
04/11/2013 | Intel(R) Celeron(R) CPU 440 @ 2.00GHz | 1GB | ECDHE-RSA-AES256-SHA | 1024 bits | 321 | 322 | 322 |
06/11/13 | Intel(R) Xeon(R) CPU X3430 @ 2.40GHz | 2GB | ECDHE-RSA-AES256-SHA | 1024 bits | 2160 | 2234 | 2236 |
06/11/13 | Intel(R) Atom(TM) CPU D510 @ 1.66GHz | 4GB | ECDHE-RSA-AES256-SHA | 1024 bits | 343 | 338 | 344 |
06/11/13 | Dual Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz | 32GB | ECDHE-RSA-AES256-SHA | 1024 bits | 6082 | 6143 | 6190 |
06/11/13 | Intel(R) Atom(TM) CPU S1260 @ 2.00GHz | 4GB | ECDHE-RSA-AES256-SHA | 1024 bits | 388 | 387 | 389 |
For a more in depth look at SSL testing take a look at the excellent blog entry at Exceliance.
So which CPU performed best for SSL TPS?
Well the Dual Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz of course as it was the fastest chip in the test, SSL TPS is pretty much a pure CPU operation after all....
However we were very impressed with the new Intel® Atom 8 core processor C2750 that we managed to get our hands on, awful boot up time on the motherboard as its a pre-production trial unit... But very nice performance figures for such a small low power board....
Incidentally all of our Dell units and our ENTERPRISE MAX. unit at Loadbalancer.org use the Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, which clocks a fairly respectable 3600 SSL TPS!
But getting back to the point of this article, that's the same processor as the top of the range Kemp R-320 load balancer: http://kemptechnologies.com/emea/server-load-balancing-appliances/loadmaster-r320/overview.
So why do they claim 8,000 SSL TPS on their web site?: http://kemptechnologies.com/emea/server-load-balancing-appliances/product-matrix.html.
Deliberately provocative question by the way... Kemp sell a great product, we are just making the point that SSL stats as with all stats should be taken with a pinch of salt!
Just found another really nice blog on open source SSL performance by Vincent Bernat.