It all started last fall, when I noticed that every once in a while, I’d get some pretty noticeable flat spots in the performance graphs when running at rates in excess of 100 requests per second. You can easily see the flat spots in the graph at right (running this setup – 4 gems on a 4 core machine with 10 concurrent siege sessions). Peaking at 200 requests/second, but the graph has big swings.
I originally blamed the the flat spots on “contention,” but I didn’t know where the flat spots were coming from.
In May I started tracking down the source of those flat spots. Over a several days of testing I was able to rule out disk I/O, network I/O, and GemStone internals as the source of the contention. All vital signs on all of the systems involved were flatlined – I used separate machines for siege, Apache, and GLASS.
I finally got around to using tcpdump and I was able to see that the last packet to flow between machines before the flat spot was an HTTP request packet heading into the Apache box. The flat spot ended with a packet heading from the Apache box to the Gemstone box. Pretty clear evidence that Apache was the culprit. Without getting into the internals of Apache, I figured that the contention must be an unfortunate interaction between the MPM worker module (which is multi-threaded) and mod_fastcgi.
I asked our IT guys to install lighttpd and you can see the results in the graph at right (32 gems on an 8 core machine with 180 concurrent siege sessions). In this run we’re peaking at 400 requests/second (twice as many cores), but the performance graph is much tighter (standard deviation of 36 for lighttpd versus 60 with Apache) and best of all, no flat spots. Soooo, if you expect to be hitting rates above 100 requests/second, you should be using lighttpd.
Not only is lighttpd performant, but it is pretty easy to setup as well. It turns out that Gemstone/S and FastCGI with lighttpd‘ where he describes how to set up lighttpd for GLASS.