One important thing to keep in mind when load-testing is that there are only so many socket connections you can have in Linux. This is a hard-coded kernel limitation, known as the Ephemeral Ports Issue. You can extend it (to some extent) in /etc/sysctl.conf; but basically, a Linux machine can only have about 64,000 sockets open at once. So when load testing, we have to make the most of those sockets by making as many requests as possible over a single connection. In addition to that, we'll need more than one machine to do the load generation. Otherwise, the load generators will run out of available sockets and fail to generate enough load.
Apache Bench
I started with 'ab', Apache Bench. This is the simplest general-use http benchmarking tool that I know of. And it ships with Apache, so it's probably already on your system. Unfortunately, I could only get about 900 requests/sec using this. I've seen other people get up to 2,000 with it, but I could tell right away that 'ab' wasn't the tool for this job.
Httperf
Next, I tried 'httperf'. This tool is more powerful, but still relatively simple and limited in its capabilities. Figuring out how many req/sec you'll be generating is not as straightforward as just passing it a number. It took me several tries to get more than a couple hundred req/sec. For example:
This creates 100,000 sessions, at a rate of 1,000 per second. Each session makes 5 calls, which are spread out by 2 seconds.
httperf --hog --server=192.168.122.10 --wsess=100000,5,2 --rate 1000 --timeout 5
Total: connections 117557 requests 219121 replies 116697 test-duration 111.423 s
Connection rate: 1055.0 conn/s (0.9 ms/conn, <=1022 concurrent connections)
Connection time [ms]: min 0.3 avg 865.9 max 7912.5 median 459.5 stddev 993.1
Connection time [ms]: connect 31.1
Connection length [replies/conn]: 1.000
Request rate: 1966.6 req/s (0.5 ms/req)
Request size [B]: 91.0
Reply rate [replies/s]: min 59.4 avg 1060.3 max 1639.7 stddev 475.2 (22 samples)
Reply time [ms]: response 56.3 transfer 0.0
Reply size [B]: header 267.0 content 18.0 footer 0.0 (total 285.0)
Reply status: 1xx=0 2xx=116697 3xx=0 4xx=0 5xx=0
CPU time [s]: user 9.68 system 101.72 (user 8.7% system 91.3% total 100.0%)
Net I/O: 467.5 KB/s (3.8*10^6 bps)
Eventually, I was able to get 6,622 connections/sec with these settings:
httperf --hog --server 192.168.122.10 --num-conn 100000 --ra 20000 --timeout 5
(A total of 100,000 connections created, and the connections are created at a fixed rate of 20,000 per second.)
It has potential, and a few more features than 'ab'. But not quite the heavy-lifter that I need for this project. I need something that supports multiple load-testing nodes in a distributed fashion. Hence, my next attempt: Jmeter.
Installing Tsung in CentOS 6.2
The first thing you'll need is the EPEL repository (for Erlang). So 
set those up before continuing. Once that's done, install the required packages on each of the nodes that you'll be using to generate load. If you don't already have passwordless SSH keys set up between the nodes, do that too.
yum -y install erlang perl perl-RRD-Simple.noarch perl-Log-Log4perl-RRDs.noarch gnuplot perl-Template-Toolkit firefox
Download the latest Tsung from Github, or from their website.
wget http://tsung.erlang-projects.org/dist/tsung-1.4.2.tar.gz
Untar and compile.
tar zxfv  tsung-1.4.2.tar.gz
cd tsung-1.4.2
./configure && make && make install
Copy the example config into ~/.tsung. This is the location of the Tsung config files, and log files.
cp  /usr/share/doc/tsung/examples/http_simple.xml /root/.tsung/tsung.xml
You can edit this file to your specifications, or use the one that works for me. This is my config that, after much trial and error, now generates 5 million http requests per second, when used with 7 distributed nodes.
It's a lot to take in at first, but it's really quite simple once you understand it. 
-  is simply the host(s) to run Tsung on. You can specify IPs, and the max number of CPUs that you want Tsung to use. You can also set a limit on the number of users that the node will simulate with maxusers. Each of these users will perform an operation that we will define later. 
-  is the name(s) of the [http] server you want to test. We will be using this option to test the cluster IP, as well as individual servers.
- defines when our simulated users will "arrive" at our website, and how quickly they will arrive.
-  In phase 1, which lasts 10 minutes, 15,000 users will arrive, at a rate of 8 per second.
 
 
- There are two more arrivalphases, in which users arrive in a similar fashion. 
- Altogether, these arrivalphases make up a , which controls how many requests per second we'll be generating.
 
-  This section defines what those users will be doing once they've arrived at your website.
- probability allows you to define random things that users might do. Sometimes they may click this, other times they may click that. Probabilities must add up to equal 100%.
- In the configuration above, the users only ever do one thing, so it has a probability of 100%. 
-  This is what the users do, 100% of the time. They loop through 10,000,000 times and  a single web page, /test.txt.
- This looping construct allows us to use less user-connections to achieve a very high number of requests per second.
Once you've got that in place, you can create this handy alias to quickly view your Tsung reports.
vim ~/.bashrc
alias treport="/usr/lib/tsung/bin/tsung_stats.pl; firefox report.html"
source ~/.bashrc
Then start up Tsung.
[root@loadnode1 ~] tsung start
Starting Tsung
"Log directory is: /root/.tsung/log/20120421-1004"
And view the report when finished.
cd /root/.tsung/log/20120421-1004
treport
Using Tsung to Plan Your Cluster Build
Now that we have a powerful enough load-testing tool, we can plan the rest of the cluster build:
- Use Tsung to test a single http server. Get a base benchmark.
- Tune the heck out of those web servers, testing with Tsung regularly to see improvements.
- Tune the TCP sockets of those systems to obtain optimal network performance. Again, test, test, test.
- Build the LVS cluster, which contains those fully-tuned web servers.
- Stress-test LVS by using Tsung on the cluster IP.
In the next two articles, I'll show you how to get your web server performing at top speed, and how to bring it altogether with the LVS cluster software.