This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
From: "Rob Mueller (fastmail)" <robm@fastmail.fm> To: "Paul Mineiro" <paul@codegrok.com>, <modperl@apache.org> Subject: Re: slow regex [BENCHMARK] Date: Fri, 25 Jan 2002 16:21:28 +1100 "Paul Mineiro" <paul@codegrok.com> wrote: > Paul Mineiro wrote: > > i've cleaned up the example to tighten the case: > > the mod perl code snippet is: > > --- > > my @cg; > > open DIL, '>', "/tmp/seqdata"; > print DIL $seq; > close DIL; > > warn "length seq = @{[length ($seq)]}"; > > my $t = timeit (1, sub { > while ($seq =~ /CG/g) > { > push @cg, pos ($seq); > } > }); > > print STDERR timestr ($t), "\n"; > > --- > > which yields > length seq = 200001 at > /home/aerives/genegrokker-interface/mod_perl/genomic_img.pm line 634, > <GEN1> line 102 > 16 wallclock secs (15.56 usr + 0.01 sys = 15.57 CPU) @ 0.06/s (n=1) > > and the perl script (command line) version is: > > --- > > #!/usr/bin/perl > > use Benchmark; > use strict; > > open DIL, '<', "/tmp/seqdata"; > my $seq = <DIL>; > close DIL; > > warn "length seq is @{[length $seq]}"; > > my @cg; > > my $t = timeit (1, sub { > while ($seq =~ /CG/g) > { > push @cg, pos ($seq); > } > }); > > print STDERR timestr ($t), "\n"; > > --- > which yields: > > length seq is 200001 at ./t.pl line 10. > 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) > > the data is pretty big, so i didn't attach it, but feel free to contact > me directly for it. > > -- p > > >hi. i'm running mod_perl 1.26 + apache 1.3.14 + perl 5.6.1 > > > >i have a loop in a mod_perl handler like so: > >---- > > my $stime = time (); > > > > while ($seq =~ /CG/og) > > { > > push @cg, pos ($seq); > > } > > > > my $etime = time (); > > > > warn "time was: ", scalar localtime ($stime), " ", > > scalar localtime ($etime), " ", $etime - $stime; > >---- > > > >under mod_perl this takes 23 seconds. running the perl "by hand" (via > >extracting this piece into a seperate perl script) on the same data takes > >less than 1 second. > > > >has anyone seen this kind of extreme slowdown before? > > > >-- p > > > >info: > > > >apache build options: > > > >CFLAGS="-g -g -O3 -funroll-loops" \ > >LDFLAGS="-L/home/aerives/lib -L/home/aerives/lib/mysql" \ > >LIBS="-L/home/aerives/genegrokker-interface/lib > >-L/home/aerives/genegrokker-interface/ext/lib -L/home/aerives/lib > >-L/home/aerives/lib/mysql" \ > >./configure \ > >"--prefix=/home/aerives/genegrokker-interface/ext" \ > >"--enable-rule=EAPI" \ > >"--enable-module=most" \ > >"--enable-shared=max" \ > >"--with-layout=GNU" \ > >"--disable-rule=EXPAT" \ > >"$@" > > > >mod_perl build options: > > > >configure_options="PERL_USELARGEFILES=0 USE_APXS=1 > >WITH_APXS=$PLAYPEN_ROOT/ext/sbin/apxs EVERYTHING=1 > >INC=$PLAYPEN_ROOT/ext/include -DEAPI" > > > >perl -V: > >Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration: > > Platform: > > osname=linux, osvers=2.4.13, archname=i386-linux > > uname='linux duende 2.4.13 #1 wed oct 31 19:18:07 est 2001 i686 unknown ' > > config_args='-Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux > >-Dprefix=/usr -Dprivlib=/usr/share/perl/5.6.1 -Darchlib=/usr/lib/perl/5.6.1 > >-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl 5 > >-Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.6.1 > >-Dsitearch=/usr/local/lib/perl/5.6.1 -Dman1dir=/usr/share/man/man1 > >-Dman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3perl > >-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Duseshrplib > >-Dlibperl=libperl.so.5.6.1 -Dd_dosuid -des' > > hint=recommended, useposix=true, d_sigaction=define > > usethreads=undef use5005threads=undef useithreads=undef > >usemultiplicity=undef > > useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef > > use64bitint=undef use64bitall=undef uselongdouble=undef > > Compiler: > > cc='cc', ccflags ='-DDEBIAN -fno-strict-aliasing -I/usr/local/include > >-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', > > optimize='-O2', > > cppflags='-DDEBIAN -fno-strict-aliasing -I/usr/local/include' > > ccversion='', gccversion='2.95.4 (Debian prerelease)', gccosandvers='' > > intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 > > d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 > > ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', > >lseeksize=8 > > alignbytes=4, usemymalloc=n, prototype=define > > Linker and Libraries: > > ld='cc', ldflags =' -L/usr/local/lib' > > libpth=/usr/local/lib /lib /usr/lib > > libs=-lgdbm -ldb -ldl -lm -lc -lcrypt > > perllibs=-ldl -lm -lc -lcrypt > > libc=/lib/libc-2.2.4.so, so=so, useshrplib=true, libperl=libperl.so.5.6.1 > > Dynamic Linking: > > dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' > > cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' > > > > > >Characteristics of this binary (from libperl): > > Compile-time options: USE_LARGE_FILES > > Built under linux > > Compiled at Jan 11 2002 04:09:18 > > %ENV: > > > >PERL5LIB="/home/aerives/genegrokker-interface/lib/perl5:/home/aerives/geneg rokker-interface/ext/lib/perl5:/home/aerives/lib/perl5" > > @INC: > > /home/aerives/genegrokker-interface/lib/perl5 > > /home/aerives/genegrokker-interface/ext/lib/perl5 > > /home/aerives/lib/perl5 > > /usr/local/lib/perl/5.6.1 > > /usr/local/share/perl/5.6.1 > > /usr/lib/perl5 > > /usr/share/perl5 > > /usr/lib/perl/5.6.1 > > /usr/share/perl/5.6.1 > > /usr/local/lib/site_perl I recently had a similar problem. A regex that worked fine in sample code was a dog in the web-server code. It only happened with really long strings. I tracked down the problem to this from the 'perlre' manpage. WARNING: Once Perl sees that you need one of "$&", "$`", or "$'" anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression "(?: ... )" instead.) But if you never use "$&", "$`" or "$'", then patterns without capturing parentheses will not be penalized. So avoid "$&", "$'", and "$`" if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid the price. As of 5.005, "$&" is not so costly as the other two. Basically one of the modules in the web-app I was 'use'ing needed $', but my test code didn't 'use' that module. The result was pretty dramatic in this case, something that took approx 1 second in the test code was timing out after 2 minutes in the web-server. What I did in the end was something like this: In the code somewhere add this so it's run when a request hits. open(F, '>/tmp/modulelist'); print F join("\n", values %INC), "\n"; close(F); This creates a file which lists all the loaded modules. Then after sticking a request through the browser, do something like: grep \$\' `cat /tmp/modulelist` grep \$\& `cat /tmp/modulelist` grep \$\` `cat /tmp/modulelist` to try and track down the offending module. You'll get quite a few false hits (comments, etc), but you might find an offending module. The main ones I found were: Parse::RecDescent Net::DNS and a couple of others I can't remember now. I fixed Net::DNS myself and sent a patch to the maintainer, but haven't heard anything. If you find this happens to be your problem as well, ask me for the patched version. Parse::RecDescent makes heavy use of the above vars, no chance of fixing that in a hurry. === Date: Fri, 25 Jan 2002 13:35:35 +0800 From: Stas Bekman <stas@stason.org> To: "Rob Mueller (fastmail)" <robm@fastmail.fm> Subject: Re: slow regex [BENCHMARK] Also check Devel::FindAmpersand and Devel::SawAmpersand. cpan> i /ampersand/ Distribution A/AN/ANDK/Devel-SawAmpersand-0.20.tar.gz Module Apache::SawAmpersand (Contact Author The Perl/Apache Mailing List <modperl@apache.org>) Module B::FindAmpersand (A/AN/ANDK/Devel-SawAmpersand-0.20.tar.gz) Module Devel::FindAmpersand (A/AN/ANDK/Devel-SawAmpersand-0.20.tar.gz) Module Devel::SawAmpersand (A/AN/ANDK/Devel-SawAmpersand-0.20.tar.gz) 5 items found The Devel::SawAmpersand also explains a workaround for using &. === Date: Fri, 25 Jan 2002 14:36:00 -0800 From: Paul Mineiro <paul@codegrok.com> To: modperl@apache.org Subject: Re: slow regex [BENCHMARK] Paul Mineiro wrote: > > right. i probably should've mentioned earlier that "CGAT" x 50000 is > really fast in both mod_perl and command line. > > if anybody wants my actual $seq data, please let me know. i neglected to mention something big: the production version is identical but using perl 5.005 and it doesn't have this problem. perl -V of the production perl follows. thanks, -- p Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration: Platform: osname=linux, osvers=2.4.4-xfs, archname=i686-linux uname='linux rock.codegrok.lab 2.4.4-xfs #8 smp wed may 30 17:37:44 pdt 2001 i686 unknown ' hint=recommended, useposix=true, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='cc', optimize='-O2', gccversion=2.95.4 20010319 (Debian prerelease) cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include' ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include' stdchar='char', d_stdstdio=undef, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldbm -ldb -ldl -lm -lc -lcrypt libc=, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Built under linux Compiled at Oct 30 2001 10:33:04 %ENV: PERL5LIB="/home/codegrok/genegrokker-interface/lib/perl5:/home/codegrok/genegrokker-interface/ext/lib/perl5" @INC: /home/codegrok/genegrokker-interface/lib/perl5 /home/codegrok/genegrokker-interface/ext/lib/perl5 /home/codegrok/genegrokker-interface/ext/lib/perl5/5.00503/i686-linux /home/codegrok/genegrokker-interface/ext/lib/perl5/5.00503 /home/codegrok/genegrokker-interface/ext/lib/perl5/site_perl/5.005/i686-linux /home/codegrok/genegrokker-interface/ext/lib/perl5/site_perl/5.005 . === Date: Fri, 25 Jan 2002 15:50:50 -0800 From: Paul Mineiro <paul@codegrok.com> To: modperl@apache.org Subject: Re: slow regex [BENCHMARK] This is a multi-part message in MIME format. --------------070404030807050303050809 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Rob Mueller (fastmail) wrote: >I recently had a similar problem. A regex that worked fine in sample code >was a dog in the web-server code. It only happened with really long strings. >I tracked down the problem to this from the 'perlre' manpage. > > WARNING: Once Perl sees that you need one of "$&", "$`", or "$'" >anywhere in the program, it > <snip> > >What I did in the end was something like this: > >In the code somewhere add this so it's run when a request hits. > >open(F, '>/tmp/modulelist'); >print F join("\n", values %INC), "\n"; >close(F); > >This creates a file which lists all the loaded modules. Then after sticking >a request through the browser, do something like: > >grep \$\' `cat /tmp/modulelist` >grep \$\& `cat /tmp/modulelist` >grep \$\` `cat /tmp/modulelist` > >to try and track down the offending module. > well, the good (bad?) news is, none of the modules in the module list have the expensive regex variables in them. i've attached the module list, in case it is of interest. thanks, -- p --------------070404030807050303050809 Content-Type: text/plain; name="modulelist" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="modulelist" /home/aerives/genegrokker-interface/lib/perl/Keys.pm /usr/share/perl/5.6.1/Carp.pm /usr/share/perl/5.6.1/unicode/To/Upper.pl /home/aerives/genegrokker-interface/lib/perl/PrimerFunctions.pm /home/aerives/genegrokker-interface/ext/lib/perl5/auto/Storable/autosplit.ix /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Common.pm /usr/share/perl/5.6.1/IO/Socket/UNIX.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Annotation.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Constants.pm /usr/share/perl/5.6.1/IO/Socket/INET.pm /home/aerives/genegrokker-interface/ext/lib/perl5/XML/Parser.pm /usr/share/perl/5.6.1/strict.pm /usr/share/perl/5.6.1/base.pm /usr/share/perl/5.6.1/vars.pm /usr/share/perl/5.6.1/utf8.pm /usr/lib/perl/5.6.1/Config.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Feature.pm /home/aerives/genegrokker-interface/mod_perl/tools.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Crypt/CBC.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Digest/SHA1.pm /usr/lib/perl/5.6.1/Data/Dumper.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Server.pm /home/aerives/genegrokker-interface/ext/lib/perl5/URI/Escape.pm /home/aerives/genegrokker-interface/lib/perl/Validate.pm /home/aerives/genegrokker-interface/ext/lib/perl5/GskXmlProtocol.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Connection.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/User.pm /usr/share/perl/5.6.1/Symbol.pm /usr/share/perl/5.6.1/Exporter/Heavy.pm /home/aerives/genegrokker-interface/mod_perl/genomic_img.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Storable.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/Lock/File.pm /home/aerives/genegrokker-interface/mod_perl/genomicbrowser.pm /home/aerives/genegrokker-interface/lib/perl/Authenticate.pm /home/aerives/genegrokker-interface/ext/lib/perl5/mod_perl.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/AnnotationSequence.pm /home/aerives/genegrokker-interface/ext/var/tmp/genegrokker-interface-aerives/ssl_mod_perl_apache-startup.pl /usr/share/perl/5.6.1/Benchmark.pm /usr/lib/perl/5.6.1/IO/Handle.pm /home/aerives/genegrokker-interface/ext/lib/perl5/MD5.pm /usr/lib/perl/5.6.1/Fcntl.pm /home/aerives/genegrokker-interface/ext/lib/perl5/XML/Parser/Expat.pm /usr/lib/perl/5.6.1/IO/Seekable.pm /usr/share/perl/5.6.1/Exporter.pm /usr/lib/perl/5.6.1/IO/Socket.pm /usr/share/perl/5.6.1/utf8_heavy.pl /usr/lib/perl/5.6.1/Errno.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/SequenceRegion.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Object.pm /usr/lib/perl/5.6.1/DynaLoader.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Request.pm /usr/share/perl/5.6.1/FileHandle.pm /usr/share/perl/5.6.1/File/Spec/Unix.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/Serialize/Storable.pm /usr/share/perl/5.6.1/SelectSaver.pm /home/aerives/genegrokker-interface/ext/lib/perl5/HTML/Template.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/DNAregex.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/Generate/MD5.pm /usr/lib/perl/5.6.1/IO.pm /home/aerives/genegrokker-interface/ext/lib/perl5/auto/Storable/nfreeze.al /usr/lib/perl/5.6.1/Socket.pm /home/aerives/genegrokker-interface/ext/lib/perl5/GD.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Sequence.pm /home/aerives/genegrokker-interface/lib/perl/GenegrokkerUtil.pm /usr/lib/perl/5.6.1/IO/File.pm /home/aerives/genegrokker-interface/ext/lib/perl5/auto/Storable/_freeze.al /usr/share/perl/5.6.1/integer.pm /home/aerives/genegrokker-interface/lib/perl/SequenceNavigator.pm /usr/lib/perl/5.6.1/XSLoader.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Digest/MD5.pm /usr/share/perl/5.6.1/File/Spec.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/Store/File.pm /home/aerives/genegrokker-interface/mod_perl/LoggedInHandler.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Constants/Exports.pm /home/aerives/genegrokker-interface/lib/perl/Crypto.pm /usr/share/perl/5.6.1/warnings/register.pm /usr/share/perl/5.6.1/warnings.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Table.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Client.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Crypt/Blowfish.pm /home/aerives/genegrokker-interface/ext/lib/perl5/auto/Storable/thaw.al /usr/share/perl/5.6.1/overload.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Genegrokker/Permissions.pm /home/aerives/genegrokker-interface/ext/lib/perl5/Apache/Session/File.pm /usr/share/perl/5.6.1/Cwd.pm /usr/share/perl/5.6.1/AutoLoader.pm --------------070404030807050303050809-- ===