perl_running_multiple_substitutions

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



From: "Tom Ford" <fordt@uci.edu>
Subject: Regular Expressions & Substitution
Date: Mon, 7 Aug 2000 12:14:44 -0700

Hi, I just recently started learning Perl, and I have been loving it.
Especially the Regular Expressions. However, I have run into a problem I
can't seem to get around. Perhaps someone can help me solve it.

I am using a two dimensional array for my regular expression-subsitutions
like this (for example):

$Order[3][0] = "(~) ($Operand)";
$Order[3][1] = " t";

The array contains several different types of substitions I want to perform.
Then I am using a FOR loop to run through my array like this:

    for ($i=0;$i<7;$i++)
    {
        if ( $_[0] =~ s/$Order[$i][0]/$Order[$i][1]/ )
        {
            .....
        }
    } # end FOR-loop

Which seems to work just fine. However, the problem arises when I try to use
those special escape characters (\1 \2 \3 etc...) in the second string, for
example:

$Order[3][1] = "\1 t";

When I do that, it was replacing "\1" with a smiley face. It took me a while
to realize it was inserting ASCII character 1 into my string, so I tried
single quotes like this:

$Order[3][1] = '\1 t';

But in the substition, it was putting the actual "\1" instead of the value
of "\1". So my question is: is there anyway I can use those "\1" things
within a string and use that string as my substitution? Or is there any
other way I can do this without hard coding every single Regular Expression
I would like to perform?

Thanks, I hope I made sense :)
Tom



Path: nntp.stanford.edu!newsfeed.stanford.edu!sn-xit-01!supernews.com!sn-inject-01!corp.supernews.com!not-for-mail
From: gbacon@HiWAAY.net (Greg Bacon)
Newsgroups: comp.lang.perl.misc
Subject: Re: Regular Expressions & Substitution
Date: Mon, 07 Aug 2000 19:48:28 GMT
Organization: Eric Conspiracy Secret Labs
Lines: 75
Message-ID: <sou4kc6p63a11@corp.supernews.com>
References: <8mn1n2$7ra$1@news.service.uci.edu>
Reply-To: Greg Bacon <gbacon@hiwaay.net>
X-Complaints-To: newsabuse@supernews.com
X-Eric-Conspiracy: There is no conspiracy.
X-Newsreader: trn 4.0-test72 (19 April 1999)
Xref: nntp.stanford.edu comp.lang.perl.misc:332832

In article <8mn1n2$7ra$1@news.service.uci.edu>,
    Tom Ford <fordt@uci.edu> wrote:

: I am using a two dimensional array for my regular expression-subsitutions
: like this (for example):
: 
: $Order[3][0] = "(~) ($Operand)";
: $Order[3][1] = " t";
: 
: The array contains several different types of substitions I want to perform.
: Then I am using a FOR loop to run through my array like this:
: 
:     for ($i=0;$i<7;$i++)
:     {
:         if ( $_[0] =~ s/$Order[$i][0]/$Order[$i][1]/ )
:         {
:             .....
:         }
:     } # end FOR-loop
: 
: Which seems to work just fine. However, the problem arises when I try to use
: those special escape characters (\1 \2 \3 etc...) in the second string, for
: example:
: 
: $Order[3][1] = "\1 t";
: 
: When I do that, it was replacing "\1" with a smiley face. It took me a while
: to realize it was inserting ASCII character 1 into my string, so I tried
: single quotes like this:
: 
: $Order[3][1] = '\1 t';
: 
: But in the substition, it was putting the actual "\1" instead of the value
: of "\1". So my question is: is there anyway I can use those "\1" things
: within a string and use that string as my substitution?

Build up a matcher at run time.  You should see big performance gains.
Consider this example:

    #! /usr/bin/perl -w

    use strict;

    my @subst = (
        [ 'foo'         => 'bar' ],
        [ '([awm])\1\1' => '${1}3' ],
    );

    sub replacer {
        my $code = "sub {\n    for (\@_) {\n";
        for (@_) {
            $code .= "        s/$_->[0]/$_->[1]/gi;\n";
        }
        $code .= "    }\n}\n";

        #print $code;
        my $sub = eval $code;
        die $@ if $@;

        $sub;
    }

    my $subst = replacer @subst;

    while (<DATA>) {
        $subst->($_);
        print;
    }

    __DATA__
    The food is under the bar in the barn.
    MMM employees should visit AAA on the WWW.
    Just another Perl hacker,

Greg

Path: nntp.stanford.edu!newsfeed.stanford.edu!arclight.uoregon.edu!newsfeed.ksu.edu!nntp.ksu.edu!onews.collins.rockwell.com!not-for-mail
From: Michael Carman <mjcarman@home.com>
Newsgroups: comp.lang.perl.misc
Subject: Re: Regular Expressions & Substitution
Date: Mon, 07 Aug 2000 15:03:12 -0500
Organization: None (anarchist)
Lines: 41
Message-ID: <398F1600.2293988E@home.com>
References: <8mn1n2$7ra$1@news.service.uci.edu>
NNTP-Posting-Host: gatekeeper.collins.rockwell.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD   (WinNT; U)
X-Accept-Language: en
Xref: nntp.stanford.edu comp.lang.perl.misc:332859

Tom Ford wrote:
> 
> I am using a two dimensional array for my regular 
> expression-subsitutions like this (for example):
> 
> $Order[3][0] = "(~) ($Operand)";
> $Order[3][1] = " t";

Note: $Order[3][0] will contain the value of $Operand at the time it was
assigned, not at the time it is used. Is that what you want?
 
> The array contains several different types of substitions I want 
> to perform.
> Then I am using a FOR loop to run through my array like this:
> 
>     for ($i=0;$i<7;$i++)

Ah, another C programmer joins our ranks.

> [T]he problem arises when I  try to use those special escape 
> characters (\1 \2 \3 etc...) in the second string, for example:
> 
> $Order[3][1] = "\1 t";
> 

Those aren't really escape chars, they're backreferences. Also (IIRC)
that format is deprecated. Use $1, $2, etc. instead.

> When I do that, it was replacing "\1" with a smiley face. It took 
> me a while to realize it was inserting ASCII character 1 into my
> string, so I tried single quotes like this:
> 
> $Order[3][1] = '\1 t';
> 
> But in the substition, it was putting the actual "\1" 

Yes, because the RHS of a regex is normally taken literally. Change your
's///' to 's///ee' -- the /ee will force (double) evaluation of the RHS
as an expression.

-mjc

Path: nntp.stanford.edu!newsfeed.stanford.edu!logbridge.uoregon.edu!ihnp4.ucsd.edu!news.service.uci.edu!not-for-mail
From: "Tom Ford" <fordt@uci.edu>
Newsgroups: comp.lang.perl.misc
Subject: Re: Regular Expressions & Substitution
Date: Mon, 7 Aug 2000 13:32:21 -0700
Organization: University of California, Irvine
Lines: 12
Message-ID: <8mn685$9p3$1@news.service.uci.edu>
References: <8mn1n2$7ra$1@news.service.uci.edu> <sou4kc6p63a11@corp.supernews.com>
Reply-To: "Tom Ford" <fordt@uci.edu>
NNTP-Posting-Host: 207.136.135.118
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Xref: nntp.stanford.edu comp.lang.perl.misc:332856

> Build up a matcher at run time.  You should see big performance gains.

Interesting technique! I can tell I still have a lot to learn with perl :) I
didnt know it was so flexible.

This should solve my problem nicely.

Thanks so much,
Tom





the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu