improving_gcc_compiler_performance

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



To: perl5-porters@perl.org, modperl@apache.org
From: Tim Bunce <Tim.Bunce@ig.co.uk>
Subject: Best GCC compiler options for Intel (perl & apache)
Date: Thu, 1 Feb 2001 13:51:56 +0000

Can anyone recommend extra gcc options to squeeze the last ounce of
performance out of code (perl and apache in this case) on Intel?

I don't mind tying the code down to one cpu type or loosing the ability
to debug etc. We're already doing -O6 and are looking for more.

I recall Malcom Beattie (CC'd, Hi Malcolm!) experimenting in this area,
something about not wasting a register for the frame pointer.

I'm using gcc 2.95.2, is that the latest/best?
It's on FreeBSD 4.1 and 4.2.

===
To: Tim Bunce <Tim.Bunce@ig.co.uk>
From: James W Walden <jamesw@ichips.intel.com>
Subject: Re: Best GCC compiler options for Intel (perl &
apache)
Date: Thu, 1 Feb 2001 09:17:19 -0800 (PST)

On Thu, 1 Feb 2001, Tim Bunce wrote:
> Can anyone recommend extra gcc options to squeeze the last ounce of
> performance out of code (perl and apache in this case) on Intel?
>
> I don't mind tying the code down to one cpu type or loosing the ability
> to debug etc. We're already doing -O6 and are looking for more.

I use '-march=i686 -mcpu=i686' to improve performance with gcc. The
percentage improvement varies greatly between applications but is often
around 10%. If you're willing to use a commercial compiler instead of
gcc, I get a 20-40% improvement with Intel's proton C compiler (which I
think is only available commercially for Windows so far) over gcc and
have found other commercial compilers to produce similar gains.

> I'm using gcc 2.95.2, is that the latest/best?

It is.  Gcc 3.0 is supposed to be released by the end of this quarter.

===
To: Tim Bunce <Tim.Bunce@ig.co.uk>
From: "G.W. Haywood" <ged@www.jubileegroup.co.uk>
Subject: Re: Best GCC compiler options for Intel (perl &
apache)
Date: Thu, 1 Feb 2001 17:20:11 +0000 (GMT)

Hi Tim,

On Thu, 1 Feb 2001, Tim Bunce wrote:

> Can anyone recommend extra gcc options to squeeze the last ounce of
> performance out of code (perl and apache in this case) on Intel?
> 
> I don't mind tying the code down to one cpu type or loosing the ability
> to debug etc. We're already doing -O6 and are looking for more.

This kind of question usually spawns VERY long threads on this List...

I feel sure you shouldn't be using more than -O2 but I can't remember
why nor where I read it.  If I remember I'll let you know.

The compiler isn't the place to look for performance gains.  Look to
your system architecture, Perl code.  See if you can code the things
that get executed the most in C.  Use handlers, not Registry.  Cache.
Use RAID.  Throw away your database.  Well maybe not throw it away,
but be careful how you use it.  (I can't believe I'm saying this to
you:).  Remember the Pareto Principle.

> I'm using gcc 2.95.2, is that the latest/best?

It'll do.

===

To: Tim Bunce <Tim.Bunce@ig.co.uk>
From: "Kurt D. Starsinic" <kstar@cpan.org>
Subject: Re: Best GCC compiler options for Intel (perl &
apache)
Date: Thu, 1 Feb 2001 10:21:18 -0500

On Thu, Feb 01, 2001 at 01:51:56PM +0000, Tim Bunce wrote:
> Can anyone recommend extra gcc options to squeeze the last ounce of
> performance out of code (perl and apache in this case) on Intel?
> 
> I don't mind tying the code down to one cpu type or loosing the ability
> to debug etc. We're already doing -O6 and are looking for more.
> 
> I recall Malcom Beattie (CC'd, Hi Malcolm!) experimenting in this area,
> something about not wasting a register for the frame pointer.
> 
> I'm using gcc 2.95.2, is that the latest/best?
> It's on FreeBSD 4.1 and 4.2.

    If you can get perl to compile under TenDRA, you might get better
performance.  It's a FreeBSD port, BTW.  I had looked into this a while
ago, but the documentation left a lot to be desired.  You might also try
pgcc.

===

To: Tim Bunce <Tim.Bunce@ig.co.uk>
From: Steve Fink <sfink@digital-integrity.com>
Subject: Re: Best GCC compiler options for Intel (perl &
apache)
Date: Thu, 01 Feb 2001 09:07:51 -0800

Tim Bunce wrote:
> 
> Can anyone recommend extra gcc options to squeeze the last ounce of
> performance out of code (perl and apache in this case) on Intel?
> 
> I don't mind tying the code down to one cpu type or loosing the ability
> to debug etc. We're already doing -O6 and are looking for more.
> 
> I recall Malcom Beattie (CC'd, Hi Malcolm!) experimenting in this area,
> something about not wasting a register for the frame pointer.

That particular option would be gcc -fomit-frame-pointer.

You might try -ffast-math -fexpensive-optimizations (never played with
the latter, though, and it's probably on with -O6 anyway).

If you really want to go crazy, you could try -fbranch-probabilities
(requires more than just turning it on; read the gcc man page.) I doubt
it's worth the trouble.

And you'd probably want -march=i686 (or whatever CPU you're using).

I don't know the state of pentium-specific optimizations, but does
Cygnus's Code Fusion still have a gcc with Pentium-specific
optimizations that aren't in the main tree? I just remember the numbers
saying that they'd slightly overtaken Intel's compiler, but that was a
year and a half ago.

Unrelated to the compiler, if you're throwing around significant chunks
of data, you might want to try tuning your drives. Especially if they're
IDE, since UDMA is often disabled for safety by default. I don't know
much about SCSI tuning, but whichever interface you're using, make sure
the heads are able to go around in circles really fast.

You can also play tricks with RAM disks, or solid-state hard drives like
the ones from platypustechnologies.com. But this gets too far afield.

===
To: sfink@digital-integrity.com
From: <nick@ing-simmons.net>
Subject: Re: Best GCC compiler options for Intel (perl &
apache)
Date: Thu, 01 Feb 2001 19:51:03 +0000

This isn't the best place to ask these questions.


Steve Fink <sfink@digital-integrity.com> writes:
>
>And you'd probably want -march=i686 (or whatever CPU you're using).

Not necessarily. gcc and ia32 is weird that way.

I would use whatever Linus & co. decided to use for the kernel on that 
arch in question.

===

To: "G.W. Haywood" <ged@www.jubileegroup.co.uk>
From: Tim Bunce <Tim.Bunce@ig.co.uk>
Subject: Re: Best GCC compiler options for Intel (perl &
apache)
Date: Fri, 2 Feb 2001 13:23:46 +0000

[Thanks for all the feedback on this from everyone.]

On Thu, Feb 01, 2001 at 05:20:11PM +0000, G.W. Haywood wrote:
> 
> The compiler isn't the place to look for performance gains.  Look to
> your system architecture, Perl code.  See if you can code the things
> that get executed the most in C.  Use handlers, not Registry.  Cache.
> Use RAID.  Throw away your database.  Well maybe not throw it away,
> but be careful how you use it.  (I can't believe I'm saying this to
> you:).  Remember the Pareto Principle.

Believe me when I say that we've gone along way down those roads
before wanting to squeeze a little more from the compiler.

===
To: Tim Bunce <Tim.Bunce@ig.co.uk>
From: Tim Bunce <timbo@valueclick.com>
Subject: Re: Best GCC compiler options for Intel (perl &
apache)
Date: Thu, 8 Feb 2001 12:08:14 +0000

Last week I asked...

On Thu, Feb 01, 2001 at 01:51:56PM +0000, Tim Bunce wrote:
> Can anyone recommend extra gcc options to squeeze the last ounce of
> performance out of code (perl and apache in this case) on Intel?
> 
> I don't mind tying the code down to one cpu type or loosing the ability
> to debug etc. We're already doing -O6 and are looking for more.
> 
> I recall Malcom Beattie (CC'd, Hi Malcolm!) experimenting in this area,
> something about not wasting a register for the frame pointer.
> 
> I'm using gcc 2.95.2, is that the latest/best?
> It's on FreeBSD 4.1 and 4.2.

I've appended a summary (with some additional notes after my reading of
the GCC 2.95.2 docs in square brackets).

Many thanks to all who contributed. I'm off to play with these options
now. I'll report back later.

Tim.


From: Greg Cope <greg@rubberplant.freeserve.co.uk>

I've used this, but have had a few unresolved segfaults on buzy machines:

  -O6 -mcpu=pentium -march=pentium -fomit-frame-pointer

[-march=pentium implies -mcpu=pentium]


From: Owen Williams <williams@dmu.ac.uk>

I saw these on a site somewhere for compiling the linux kernel:

  -mcpu=pentiumpro -mpentium -ffast-math -O5 -fthread-jumps

[-mpentium is deprecated synonym for -mcpu=pentium. -O enables -fthread-jumps]

Use them on anything that is pentiumpro and above.  I get a good speed
increase.


From: Vivek Khera <khera@kciLink.com>

There were some important compiler fixes in FreeBSD 4.x that went in
early in January.  If you can, I'd recommend updating to the latest
4.2-STABLE version for the most stable compiler environment.  Most
important if you're compiling threaded apps in C++ (eg, MySQL).

Personally, I use these options with good effect:

 -O2 -pipe -march=i586 -ffast-math -mfancy-math-387

Anything beyond that is bound to tickle gcc bugs.


From: Steve Fink <sfink@digital-integrity.com>

> I recall Malcom Beattie (CC'd, Hi Malcolm!) experimenting in this area,
> something about not wasting a register for the frame pointer.

That particular option would be gcc -fomit-frame-pointer.
You might try -ffast-math -fexpensive-optimizations (never played with
the latter, though, and it's probably on with -O6 anyway).

If you really want to go crazy, you could try -fbranch-probabilities
(requires more than just turning it on; read the gcc man page.) I doubt
it's worth the trouble.

And you'd probably want -march=i686 (or whatever CPU you're using).

I don't know the state of pentium-specific optimizations, but does
Cygnus's Code Fusion still have a gcc with Pentium-specific
optimizations that aren't in the main tree? I just remember the numbers
saying that they'd slightly overtaken Intel's compiler, but that was a
year and a half ago.


From: nick <nick@ing-simmons.net>
>
>And you'd probably want -march=i686 (or whatever CPU you're using).

Not necessarily. gcc and ia32 is weird that way.  I would use whatever
Linus & co. decided to use for the kernel on that arch in question.


From: James W Walden <jamesw@ichips.intel.com>

I use '-march=i686 -mcpu=i686' to improve performance with gcc. The
percentage improvement varies greatly between applications but is often
around 10%. If you're willing to use a commercial compiler instead of
gcc, I get a 20-40% improvement with Intel's proton C compiler (which I
think is only available commercially for Windows so far) over gcc and
have found other commercial compilers to produce similar gains.


From: Mark Mielke <markm@nortelnetworks.com>

Try the pgcc patch.
I don't even think -O6 does anything for gcc 2.95.x, although my
memory is faint. I think it only goes to -O3.

To re-order the instructions for a pentium:

    gcc -O3 -mpentium -march=pentium ...

If you apply the pgcc patch, it will actually use the new instructions
available only on the pentium, and not on the 386/486, where desirable.


From: "Redford, John" <John.Redford@fmr.com>

Why for me is that -O3 (and presumably -O6) performs optimizations that are
"unsafe". I have had critical bugs caused by compiling Perl with -O3, (which
used to be habitual). Now I only use -O2.

(Or possibly the optimizations were simply buggy in GCC; definitely this was
with GCC of years long ago, I haven't tried to push my luck again).


From: Perrin Harkins <perrin@primenet.com>

It's a bit old, but there's this page:
http://www.google.com/search?q=cache:members.nbci.com/Alex_Maranda/gnuintel/GNUintel.htm&hl=en&lr=lang_en
He comes out in favor of using PGCC.


[Summary:

  http://gcc.gnu.org/onlinedocs/gcc-2.95.2/gcc_2.html#SEC10
  http://gcc.gnu.org/onlinedocs/gcc-2.95.2/gcc_2.html#SEC31
  http://members.nbci.com/Alex_Maranda/gnuintel/GNUintel.htm

  gcc  -O3 -malign-double -ffast-math -funroll-all-loops -fno-rtti -fno-exceptions 

  pgcc -O6 -malign-double -ffast-math -funroll-all-loops -fno-rtti -mcpu=pentiumpro

  Using -mcpu=pentiumpro doesn't stop code running on old 386 so is
  probably a good idea as a default for Perl & Apache on Intel.

  To use pentiumpro specific instructions (won't run on i386) use:
  -march=pentiumpro (which also implies -mcpu=pentiumpro)

  -fomit-frame-pointer makes extra register available but disables debugging
]

===


the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu