load_testing_mod_perl

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

Subject: Re: [RFC] Swapping Prevention
From: Stas Bekman <stas@stason.org>
Date: Thu, 22 Jun 2000 20:47:38 +0200 (CEST)

On Tue, 20 Jun 2000, Joshua Chamas wrote:

> > your machine. Therefore you should configure the server, so that the
> > maximum number of possible processes will be small enough using the
> > C<MaxClients> directive. This will ensure that at the peak hours the
> > system won't swap. Remember that swap space is an emergency pool, not
> > a resource to be used routinely.  If you are low on memory and you
> > badly need it, buy it or reduce a number of processes to prevent
> > swapping.
> 
> One common mistake that people make is to not load
> test against a server to trigger the full MaxClients
> in production.  In order to prevent swapping, one must
> simulate the server at its point of greatest RAM stress, 
> and one can start to do this by running ab against
> a program so that each of the MaxClient httpd processes
> goes through its MaxRequests.  
> 
> So lets say MaxRequests is set to 1000 and MaxClients
> is set to 100, then in order to see the system go 
> through a full cycle fully maxed out in RAM, one might
> 
>  ab -c 100 -n 100000 http://yoursite.loadtest/modperl/ramhog
> 
> Then fire up top, sit back, and enjoy the show!
> 
> In summary this aids in swapping prevention, by load testing
> the server under highest RAM stress preproduction, and seeing
> if it swaps.  Then one can tune their parameters to avoid 
> swapping under this highest load, by decreasing MaxClients
> or MaxRequests.

Hmm, good points Joshua. But just running the above example is not
enough. Remember that when running the production server code become
unshared, more memory is used, therefore the above test won't protect
ensure that swapping won't happen. 

Unless you can reproduce a production server load, including different
queries that will lead to the very close to real simulation of the server
usage your advice is definitely helps but not as much as one would want.

I think that it'd be really nice to have a post processing handler that
will log all the requests including the query data and feed it into a
database on the constant basis, merging with the previously identicall
inputs and bumping out the count. So when you will want to reproduce the
real load of your server, the other program will analyze this database and
will generate the output like this:

input         %
 ----------------------
URL1?query1   46.1%
URL1?query2   12.1%
URL2?query3    7.2%
URL2?query4    4.2%
...........
URLN?queryN   0.2%
 -------------------
              100%

so this input can be fed into the program that runs ab. So if you want to
simulate 10000 requests, you will run 10000*0.461 URLs of the request in
the first row above, 10000*0.121 of the second, etc...

The you will get to tune your server to the maximum performance given that
the behavior of the users is not about to change. The older your database
is the more correct statistics you will get.

Cith help of real usage statistics, in addition ability to fine tune the
MaxClient directive, you will learn what sections of code are mostly used
and therefore optimize them. 

===

_____________________________________________________________________
Stas Bekman              JAm_pH     --   Just Another mod_perl Hacker
http://stason.org/       mod_perl Guide  http://perl.apache.org/guide 
mailto:stas@stason.org   http://perl.org     http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu