modperl-Apache::args_now_wins_speed_benchmarks

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

To: modperl@perl.apache.org
From: Stas Bekman <stas@stason.org>
Subject: updated benchmarks for Apache::args
Date: Thu, 09 May 2002 18:06:30 +0800

Finally I was able to rerun the benchmarks for Apache::args vs. 
Apache::Request::param vs. CGI::param, using Apache::Request 1.0. And 
indeed it's much faster now and is the fastest among the three. Here is 
the updated section (soon to appear in the guide):

=head1 Apache::args vs. Apache::Request::param vs. CGI::param

C<Apache::args>, C<Apache::Request::param> and C<CGI::param> are the
three most common ways to process input arguments in mod_perl handlers
and scripts. Let's write three C<Apache::Registry> scripts that use
C<Apache::args>, C<Apache::Request::param> and C<CGI::param> to
process a form's input and print it out. Notice that C<Apache::args>
is considered identical to C<Apache::Request::param> only when you
have single valued keys. In the case of multi-valued keys (e.g. when
using check-box groups) you will have to write some extra code: If you
do a simple:

   my %params = $r->args;

only the last value will be stored and the rest will collapse, because
that's what happens when you turn a list into a hash. Assuming that
you have the following list:

   (rules => 'Apache', rules => 'Perl', rules => 'mod_perl')

and assign it to a hash, the following happens:

   $hash{rules} = 'Apache';
   $hash{rules} = 'Perl';
   $hash{rules} = 'mod_perl';

So at the end only the:

   rules => 'mod_perl'

pair will get stored.  With C<CGI.pm> or C<Apache::Request> you can
solve this by extracting the whole list by its key:

   my @values = $q->params('rules');

In addition C<Apache::Request> and C<CGI.pm> have many more functions
that ease input processing, like handling file uploads. However
C<Apache::Request> is much faster since its guts are implemented in C,
glued to Perl using XS code.

Assuming that the only functionality you need is the parsing of
key-value pairs, and assuming that every key has a single value, we
will compare the following almost identical scripts, by trying to
pass various query strings.

Here's the code:

   file:processing_with_apache_args.pl
   -----------------------------------
   use strict;
   my $r = shift;
   $r->send_http_header('text/plain');
   my %args = $r->args;
   print join "\n", map {"$_ => ".$args{$_} } keys %args;

   file:processing_with_apache_request.pl
   --------------------------------------
   use strict;
   use Apache::Request ();
   my $r = shift;
   my $q = Apache::Request->new($r);
   $r->send_http_header('text/plain');
   my %args = map {$_ => $q->param($_) } $q->param;
   print join "\n", map {"$_ => ".$args{$_} } keys %args;

   file:processing_with_cgi_pm.pl
   ------------------------------
   use strict;
   use CGI;
   my $r = shift;
   $r->send_http_header('text/plain');
   my $q = new CGI;
   my %args = map {$_ => $q->param($_) } $q->param;
   print join "\n", map {"$_ => ".$args{$_} } keys %args;

All three scripts are preloaded at server startup:

   <Perl>
       use Apache::RegistryLoader ();
       Apache::RegistryLoader->new->handler(
                                 "/perl/processing_with_cgi_pm.pl",
                      "/home/httpd/perl/processing_with_cgi_pm.pl"
                         );
       Apache::RegistryLoader->new->handler(
                                 "/perl/processing_with_apache_request.pl",
                      "/home/httpd/perl/processing_with_apache_request.pl"
                         );
       Apache::RegistryLoader->new->handler(
                                 "/perl/processing_with_apache_args.pl",
                      "/home/httpd/perl/processing_with_apache_args.pl"
                         );
   </Perl>

We use four different query strings, generated by:

   my @queries = (
       join("&", map {"$_=" . 'e' x 10}  ('a'..'b')),
       join("&", map {"$_=" . 'e' x 50}  ('a'..'b')),
       join("&", map {"$_=" . 'e' x 5 }  ('a'..'z')),
       join("&", map {"$_=" . 'e' x 10}  ('a'..'z')),
   );

The first string is:

   a=eeeeeeeeee&b=eeeeeeeeee

which is 25 characters in length and consists of two key/value
pairs. The second string is also made of two key/value pairs, but the
value is 50 characters long (total 105 characters). The third and the
forth strings are made from 26 key/value pairs, with the value lengths
of 5 and 10 characters respectively, with total lengths of 207 and 337
characters respectively. The C<query_len> column in the report table
is one of these four total lengths.

We conduct the benchmark with concurrency level of 50 and generate
5000 requests for each test.

And the results are:

   ---------------------------------------------
   name   val_len pairs query_len |  avtime  rps
   ---------------------------------------------
   apreq     10      2       25   |    51    945
   apreq     50      2      105   |    53    907
   r_args    50      2      105   |    53    906
   r_args    10      2       25   |    53    899
   apreq      5     26      207   |    64    754
   apreq     10     26      337   |    65    742
   r_args     5     26      207   |    73    665
   r_args    10     26      337   |    74    657
   cgi_pm    50      2      105   |    85    573
   cgi_pm    10      2       25   |    87    559
   cgi_pm     5     26      207   |   188    263
   cgi_pm    10     26      337   |   188    262
   ---------------------------------------------

Where C<apreq> stands for C<Apache::Request::param()>, C<r_args>
stands for C<Apache::args()> or C<$r-E<gt>args()> and C<cgi_pm> stands
for C<CGI::param()>.

You can see that C<Apache::Request::param> and C<Apache::args> have
similar performance with a few key/value pairs, but the former is
faster with many key/value pairs. C<CGI::param> is significantly
slower than the other two methods.

These results also suggest that the processing gets progressively
slower as the number of key/value pairs grows, but longer lengths of
the key/value pairs have less of a slowdown impact. To verify that
let's use the C<Apache::Request::param> method and first test several
query strings made of 5 key/value pairs with value lengths growing
from 10 characters to 60 in steps of 10:

   my @strings = map {'e' x (10*$_)} 1..6;
   my @ae = ('a'..'e');
   my @queries = ();
   for my $string (@strings) {
       push @queries, join "&", map {"$_=$string"} @ae;
   }

And the results:

   -----------------------------------
   val_len query_len    |  avtime  rps
   -----------------------------------
     10       77        |    55    877
     20      197        |    55    867
     30      257        |    56    859
     40      137        |    56    858
     50      317        |    56    857
     60      377        |    58    828
   -----------------------------------

Indeed the lengths of the values influence the speed very little, as
we can see that the average processing time almost doesn't change as
the length of the values grows.

Now let's use a fixed value length of 10 characters and test with a
varying number of key/value pairs from 2 to 26 in steps of 5:

   my @az = ('a'..'z');
   my @queries = map { join("&", map {"$_=" . 'e' x 10 } @az[0..$_]) }
       (1, 5, 10, 15, 20, 25);

And the results:

   -------------------------------
   pairs  query_len |  avtime  rps
   -------------------------------
     2       25     |    53    906
     6       77     |    55    869
    12      142     |    57    838
    16      207     |    61    785
    21      272     |    64    754
    26      337     |    66    726
   -------------------------------

Now by looking at the average processing time column, we can see that
the number of key/value pairs makes a significant impact on processing
speed.

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu