modperl-slow_regexp_behavior_from_use_ing_a_mod_with_stupid_regexps

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



From: "Rob Mueller (fastmail)" <robm@fastmail.fm>
To: "Paul Mineiro" <paul@codegrok.com>, <modperl@apache.org>
Subject: Re: slow regex [BENCHMARK]
Date: Fri, 25 Jan 2002 16:21:28 +1100

"Paul Mineiro" <paul@codegrok.com> wrote: 

> Paul Mineiro wrote:
>
> i've cleaned up the example to tighten the case:
>
> the mod perl code  snippet is:
>
> ---
>
>   my @cg;
>
>   open DIL, '>', "/tmp/seqdata";
>   print DIL $seq;
>   close DIL;
>
>   warn "length seq = @{[length ($seq)]}";
>
>   my $t = timeit (1, sub {
>                         while ($seq =~ /CG/g)
>                           {
>                             push @cg, pos ($seq);
>                           }
>                      });
>
>   print STDERR timestr ($t), "\n";
>
> ---
>
> which yields
> length seq = 200001 at
> /home/aerives/genegrokker-interface/mod_perl/genomic_img.pm line 634,
> <GEN1> line 102
> 16 wallclock secs (15.56 usr +  0.01 sys = 15.57 CPU) @  0.06/s (n=1)
>
> and the perl script (command line) version is:
>
> ---
>
> #!/usr/bin/perl
>
> use Benchmark;
> use strict;
>
> open DIL, '<', "/tmp/seqdata";
> my $seq = <DIL>;
> close DIL;
>
> warn "length seq is @{[length $seq]}";
>
> my @cg;
>
> my $t = timeit (1, sub {
>                       while ($seq =~ /CG/g)
>                         {
>                           push @cg, pos ($seq);
>                         }
>                    });
>
> print STDERR timestr ($t), "\n";
>
> ---
> which yields:
>
> length seq is 200001 at ./t.pl line 10.
>  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
>
> the data is pretty big, so i didn't attach it, but feel free to contact
> me directly for it.
>
> -- p
>
> >hi.  i'm running mod_perl 1.26 + apache 1.3.14 + perl 5.6.1
> >
> >i have a loop in a mod_perl handler like so:
> >----
> >  my $stime = time ();
> >
> >  while ($seq =~ /CG/og)
> >    {
> >      push @cg,  pos ($seq);
> >    }
> >
> >  my $etime = time ();
> >
> >  warn "time was: ", scalar localtime ($stime), " ",
> >        scalar localtime ($etime), " ", $etime - $stime;
> >----
> >
> >under mod_perl this takes 23 seconds.  running the perl "by hand" (via
> >extracting this piece into a seperate perl script) on the same data takes
> >less than 1 second.
> >
> >has anyone seen this kind of extreme slowdown before?
> >
> >-- p
> >
> >info:
> >
> >apache build options:
> >
> >CFLAGS="-g -g -O3 -funroll-loops" \
> >LDFLAGS="-L/home/aerives/lib -L/home/aerives/lib/mysql" \
> >LIBS="-L/home/aerives/genegrokker-interface/lib
> >-L/home/aerives/genegrokker-interface/ext/lib -L/home/aerives/lib
> >-L/home/aerives/lib/mysql" \
> >./configure \
> >"--prefix=/home/aerives/genegrokker-interface/ext" \
> >"--enable-rule=EAPI" \
> >"--enable-module=most" \
> >"--enable-shared=max" \
> >"--with-layout=GNU" \
> >"--disable-rule=EXPAT" \
> >"$@"
> >
> >mod_perl build options:
> >
> >configure_options="PERL_USELARGEFILES=0 USE_APXS=1
> >WITH_APXS=$PLAYPEN_ROOT/ext/sbin/apxs EVERYTHING=1
> >INC=$PLAYPEN_ROOT/ext/include -DEAPI"
> >
> >perl -V:
> >Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
> >  Platform:
> >    osname=linux, osvers=2.4.13, archname=i386-linux
> >    uname='linux duende 2.4.13 #1 wed oct 31 19:18:07 est 2001 i686
unknown '
> >
   config_args='-Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux
>
>-Dprefix=/usr -Dprivlib=/usr/share/perl/5.6.1 -Darchlib=/usr/lib/perl/5.6.1
>
>-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl
5
> >-Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.6.1
> >-Dsitearch=/usr/local/lib/perl/5.6.1 -Dman1dir=/usr/share/man/man1
> >-Dman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3perl
> >-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Duseshrplib
> >-Dlibperl=libperl.so.5.6.1 -Dd_dosuid -des'
> >    hint=recommended, useposix=true, d_sigaction=define
> >    usethreads=undef use5005threads=undef useithreads=undef
> >usemultiplicity=undef
> >    useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef
> >    use64bitint=undef use64bitall=undef uselongdouble=undef
> >  Compiler:
> >    cc='cc', ccflags ='-DDEBIAN -fno-strict-aliasing -I/usr/local/include
> >-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
> >    optimize='-O2',
> >    cppflags='-DDEBIAN -fno-strict-aliasing -I/usr/local/include'
> >    ccversion='', gccversion='2.95.4  (Debian prerelease)',
gccosandvers=''
> >    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> >    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
> >    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> >lseeksize=8
> >    alignbytes=4, usemymalloc=n, prototype=define
> >  Linker and Libraries:
> >    ld='cc', ldflags =' -L/usr/local/lib'
> >    libpth=/usr/local/lib /lib /usr/lib
> >    libs=-lgdbm -ldb -ldl -lm -lc -lcrypt
> >    perllibs=-ldl -lm -lc -lcrypt
> >    libc=/lib/libc-2.2.4.so, so=so, useshrplib=true,
libperl=libperl.so.5.6.1
> >  Dynamic Linking:
> >    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
> >    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
> >
> >
> >Characteristics of this binary (from libperl):
> >  Compile-time options: USE_LARGE_FILES
> >  Built under linux
> >  Compiled at Jan 11 2002 04:09:18
> >  %ENV:
> >
>
>PERL5LIB="/home/aerives/genegrokker-interface/lib/perl5:/home/aerives/geneg
rokker-interface/ext/lib/perl5:/home/aerives/lib/perl5"
> >  @INC:
> >    /home/aerives/genegrokker-interface/lib/perl5
> >    /home/aerives/genegrokker-interface/ext/lib/perl5
> >    /home/aerives/lib/perl5
> >    /usr/local/lib/perl/5.6.1
> >    /usr/local/share/perl/5.6.1
> >    /usr/lib/perl5
> >    /usr/share/perl5
> >    /usr/lib/perl/5.6.1
> >    /usr/share/perl/5.6.1
> >    /usr/local/lib/site_perl



I recently had a similar problem. A regex that worked fine in sample code
was a dog in the web-server code. It only happened with really long strings.
I tracked down the problem to this from the 'perlre' manpage.

       WARNING: Once Perl sees that you need one of "$&",
       "$`", or "$'" anywhere in the program, it has to
       provide them for every pattern match.  This may
       substantially slow your program.  Perl uses the same
       mechanism to produce $1, $2, etc, so you also pay a
       price for each pattern that contains capturing
       parentheses.  (To avoid this cost while retaining the
       grouping behaviour, use the extended regular
       expression "(?: ... )" instead.)  But if you never
       use "$&", "$`" or "$'", then patterns without
       capturing parentheses will not be penalized.  So
       avoid "$&", "$'", and "$`" if you can, but if you
       can't (and some algorithms really appreciate them),
       once you've used them once, use them at will, because
       you've already paid the price.  As of 5.005, "$&" is
       not so costly as the other two.

Basically one of the modules in the web-app I was 'use'ing needed $', but my
test code didn't 'use' that module. The result was pretty dramatic in this
case, something that took approx 1 second in the test code was timing out
after 2 minutes in the web-server.

What I did in the end was something like this:

In the code somewhere add this so it's run when a request hits.

open(F, '>/tmp/modulelist');
print F join("\n", values %INC), "\n";
close(F);

This creates a file which lists all the loaded modules. Then after sticking
a request through the browser, do something like:

grep \$\' `cat /tmp/modulelist`
grep \$\& `cat /tmp/modulelist`
grep \$\` `cat /tmp/modulelist`

to try and track down the offending module. You'll get quite a few false
hits (comments, etc), but you might find an offending module. The main ones
I found were:

Parse::RecDescent
Net::DNS

and a couple of others I can't remember now. I fixed Net::DNS myself and
sent a patch to the maintainer, but haven't heard anything. If you find this
happens to be your problem as well, ask me for the patched version.
Parse::RecDescent makes heavy use of the above vars, no chance of fixing
that in a hurry.

===

Date: Fri, 25 Jan 2002 13:35:35 +0800
From: Stas Bekman <stas@stason.org>
To: "Rob Mueller (fastmail)" <robm@fastmail.fm>
Subject: Re: slow regex [BENCHMARK]

Also check Devel::FindAmpersand and Devel::SawAmpersand.

cpan> i /ampersand/
Distribution    A/AN/ANDK/Devel-SawAmpersand-0.20.tar.gz
Module          Apache::SawAmpersand (Contact Author The Perl/Apache 
Mailing List <modperl@apache.org>)
Module          B::FindAmpersand (A/AN/ANDK/Devel-SawAmpersand-0.20.tar.gz)
Module          Devel::FindAmpersand 
(A/AN/ANDK/Devel-SawAmpersand-0.20.tar.gz)
Module          Devel::SawAmpersand 
(A/AN/ANDK/Devel-SawAmpersand-0.20.tar.gz)
5 items found

The Devel::SawAmpersand also explains a workaround for using &.


===

Date: Fri, 25 Jan 2002 14:36:00 -0800
From: Paul Mineiro <paul@codegrok.com>
To: modperl@apache.org
Subject: Re: slow regex [BENCHMARK]

Paul Mineiro wrote:

>
> right.  i probably should've mentioned earlier that "CGAT" x 50000 is 
> really fast in both mod_perl and command line.
>
> if anybody wants my actual $seq data, please let me know.


i neglected to mention something big:  the production version is 
identical but using perl 5.005 and it
doesn't have this problem.

perl -V of the production perl follows.

thanks,

-- p

Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
  Platform:
    osname=linux, osvers=2.4.4-xfs, archname=i686-linux
    uname='linux rock.codegrok.lab 2.4.4-xfs #8 smp wed may 30 17:37:44 
pdt 2001 i686 unknown '
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef useperlio=undef d_sfio=undef
  Compiler:
    cc='cc', optimize='-O2', gccversion=2.95.4 20010319 (Debian prerelease)
    cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'
    ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'
    stdchar='char', d_stdstdio=undef, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldbm -ldb -ldl -lm -lc -lcrypt
    libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl):
  Built under linux
  Compiled at Oct 30 2001 10:33:04
  %ENV:
    
PERL5LIB="/home/codegrok/genegrokker-interface/lib/perl5:/home/codegrok/genegrokker-interface/ext/lib/perl5"
  @INC:
    /home/codegrok/genegrokker-interface/lib/perl5
    /home/codegrok/genegrokker-interface/ext/lib/perl5
    /home/codegrok/genegrokker-interface/ext/lib/perl5/5.00503/i686-linux
    /home/codegrok/genegrokker-interface/ext/lib/perl5/5.00503
    
/home/codegrok/genegrokker-interface/ext/lib/perl5/site_perl/5.005/i686-linux
    /home/codegrok/genegrokker-interface/ext/lib/perl5/site_perl/5.005
    .


===

Date: Fri, 25 Jan 2002 15:50:50 -0800
From: Paul Mineiro <paul@codegrok.com>
To: modperl@apache.org
Subject: Re: slow regex [BENCHMARK]

This is a multi-part message in MIME format.
--------------070404030807050303050809
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Rob Mueller (fastmail) wrote:

>I recently had a similar problem. A regex that worked fine in sample code
>was a dog in the web-server code. It only happened with really long strings.
>I tracked down the problem to this from the 'perlre' manpage.
>
>       WARNING: Once Perl sees that you need one of "$&", "$`", or "$'"
>anywhere in the program, it
>
<snip>

>
>What I did in the end was something like this:
>
>In the code somewhere add this so it's run when a request hits.
>
>open(F, '>/tmp/modulelist');
>print F join("\n", values %INC), "\n";
>close(F);
>
>This creates a file which lists all the loaded modules. Then after sticking
>a request through the browser, do something like:
>
>grep \$\' `cat /tmp/modulelist`
>grep \$\& `cat /tmp/modulelist`
>grep \$\` `cat /tmp/modulelist`
>
>to try and track down the offending module. 
>
well, the good (bad?) news is, none of the modules in the module list 
have the expensive regex variables in them.

i've attached the module list, in case it is of interest.

thanks,

-- p


--------------070404030807050303050809
Content-Type: text/plain;
 name="modulelist"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="modulelist"

/home/aerives/genegrokker-interface/lib/perl/Keys.pm
/usr/share/perl/5.6.1/Carp.pm
/usr/share/perl/5.6.1/unicode/To/Upper.pl
/home/aerives/genegrokker-interface/lib/perl/PrimerFunctions.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/auto/Storable/autosplit.ix
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Common.pm
/usr/share/perl/5.6.1/IO/Socket/UNIX.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Annotation.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Constants.pm
/usr/share/perl/5.6.1/IO/Socket/INET.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/XML/Parser.pm
/usr/share/perl/5.6.1/strict.pm
/usr/share/perl/5.6.1/base.pm
/usr/share/perl/5.6.1/vars.pm
/usr/share/perl/5.6.1/utf8.pm
/usr/lib/perl/5.6.1/Config.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Feature.pm
/home/aerives/genegrokker-interface/mod_perl/tools.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Crypt/CBC.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Digest/SHA1.pm
/usr/lib/perl/5.6.1/Data/Dumper.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Server.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/URI/Escape.pm
/home/aerives/genegrokker-interface/lib/perl/Validate.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/GskXmlProtocol.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Connection.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/User.pm
/usr/share/perl/5.6.1/Symbol.pm
/usr/share/perl/5.6.1/Exporter/Heavy.pm
/home/aerives/genegrokker-interface/mod_perl/genomic_img.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Storable.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/Lock/File.pm
/home/aerives/genegrokker-interface/mod_perl/genomicbrowser.pm
/home/aerives/genegrokker-interface/lib/perl/Authenticate.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/mod_perl.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/AnnotationSequence.pm
/home/aerives/genegrokker-interface/ext/var/tmp/genegrokker-interface-aerives/ssl_mod_perl_apache-startup.pl
/usr/share/perl/5.6.1/Benchmark.pm
/usr/lib/perl/5.6.1/IO/Handle.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/MD5.pm
/usr/lib/perl/5.6.1/Fcntl.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/XML/Parser/Expat.pm
/usr/lib/perl/5.6.1/IO/Seekable.pm
/usr/share/perl/5.6.1/Exporter.pm
/usr/lib/perl/5.6.1/IO/Socket.pm
/usr/share/perl/5.6.1/utf8_heavy.pl
/usr/lib/perl/5.6.1/Errno.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/SequenceRegion.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Object.pm
/usr/lib/perl/5.6.1/DynaLoader.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Request.pm
/usr/share/perl/5.6.1/FileHandle.pm
/usr/share/perl/5.6.1/File/Spec/Unix.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/Serialize/Storable.pm
/usr/share/perl/5.6.1/SelectSaver.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/HTML/Template.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/DNAregex.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/Generate/MD5.pm
/usr/lib/perl/5.6.1/IO.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/auto/Storable/nfreeze.al
/usr/lib/perl/5.6.1/Socket.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/GD.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Sequence.pm
/home/aerives/genegrokker-interface/lib/perl/GenegrokkerUtil.pm
/usr/lib/perl/5.6.1/IO/File.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/auto/Storable/_freeze.al
/usr/share/perl/5.6.1/integer.pm
/home/aerives/genegrokker-interface/lib/perl/SequenceNavigator.pm
/usr/lib/perl/5.6.1/XSLoader.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Digest/MD5.pm
/usr/share/perl/5.6.1/File/Spec.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/Store/File.pm
/home/aerives/genegrokker-interface/mod_perl/LoggedInHandler.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Constants/Exports.pm
/home/aerives/genegrokker-interface/lib/perl/Crypto.pm
/usr/share/perl/5.6.1/warnings/register.pm
/usr/share/perl/5.6.1/warnings.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Table.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Client.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Crypt/Blowfish.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/auto/Storable/thaw.al
/usr/share/perl/5.6.1/overload.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Permissions.pm
/home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/File.pm
/usr/share/perl/5.6.1/Cwd.pm
/usr/share/perl/5.6.1/AutoLoader.pm

--------------070404030807050303050809--

===

the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu