postgres_vs_mysql

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



Subject: Re[2]: [OT] PostgreSQL gets commercial support/funding
From: Brian Ashe <brian@dee-web.com>
Date: Thu, 11 May 2000 02:02:38 -0400


Hi Matt,

Here's my synopsis...

MySQL - Not free for commercial use
        Not a full SQL implementation
        Speedy(though I don't know how much this gap is with PgSQL 7.0 but
        it has been closing with each new PgSQL release.)
        Best used for simple read/write to tables (data storage)

PgSQL - FREE!
        Very close to full SQL compatibility (especially with 7.0)
        Very nice additional PgSQL special features
        Some stuff poorly documented
        Best used for more complex uses (data manipulation) or where SQL
        compatibility is important or because it's FREE!

Me personally, I like PostgreSQL. I have never fallen under this performance
issue with it, though I have seen bench tests prove MySQL faster. I think it
is just a matter of choice. Most people ignore the fact that if MySQL is
used commercially, the authors expect to be paid. PostgreSQL is free under
the Berkeley license.

Best advice: Look at each set of docs to compare features for what YOU will
need. Run each (they don't conflict) and run some sample queries to see the
performance difference. Then judge for yourself. You will probably be happy
either way you go. Millions of people use each, so neither can be that bad.

So would anyone like to start an emacs vs vi war next? ;)

===

Subject: Re: [OT] PostgreSQL gets commercial support/funding 
From: Joe Brenner <doom@kzsu.stanford.edu>
Date: Mon, 15 May 2000 13:13:06 -0700




> > What is the groups opinions regarding MySQL vs PostgreSQL?
> > 
> > MySQL - Not free for commercial use
> >         Not a full SQL implementation
> >         Speedy(though I don't know how much this gap is with PgSQL 7.0 but
> >         it has been closing with each new PgSQL release.)
> >         Best used for simple read/write to tables (data storage)
> > 
> > PgSQL - FREE!
> >         Very close to full SQL compatibility (especially with 7.0)
> >         Very nice additional PgSQL special features
> >         Some stuff poorly documented
> >         Best used for more complex uses (data manipulation) or where SQL
> >         compatibility is important or because it's FREE!

This seems like a reasonably fair comparison.  I'm a
postgresql user in part for political reasons (the MySQL
license is really squirrelly, and this kind of nonsense
just shouldn't be encouraged).  I think a lot of younger
geeks are somewhat confused about this because Slashdot uses
MySQL, and they assume that (a) it must be technically slick
and (b) it must be politically correct (i.e. open/free).

My take on this is that MySQL's usefulness is limited to
something like Slashdot, which is a high hit-rate site 
that has to push a lot of data around, and yet paradoxically
no one really cares about the data all that much (if you
drop a few comments out of a thread, no one is going to sue
you). 

(This is presuming that MySQL really *is* fast... I haven't
benched it myself, and everyone I've talked to seems to be
relying on data posted on the MySQL site.)

I like the fact that postgresql is more of a "real" database
than MySQL... even if I don't have any plans to use
something like transactions at the moment, I like the fact
that I can incorporate them later without switching
databases (and that if I ever decided to upgrade to the
realest database -- which appears to still be Oracle -- the
feature set won't be a lot different than what I'm used
to). 


> So would anyone like to start an emacs vs vi war next? ;)

Sure, I'm always up for a good emacs/vi war.  

I think it's funny that for years vi users went around
saying they didn't need any fancy emacs features like
auto-saves and automatically generated backup files, but now
that they've got them in vim, all of a sudden they're
wonderful to have.  

Think about it: why wait for another vim upgrade when
you could have the emacs features today? 

===

Subject: RE: [OT] PostgreSQL gets commercial support/funding
From: Alan Mead <adm@ipat.com>
Date: Thu, 11 May 2000 14:53:48 -0500


At 12:54 PM 5/9/00 , Bjornson, Matt wrote:
>What is the groups opinions regarding MySQL vs PostgreSQL?

I think this has been discussed here and you may find more detail in the 
archives or on deja news.   PostgreSQL is more ANSI compliant and fully 
featured than MySQL at a price of being slower and less widely used.  Less 
featured can be deceptive though, IIRC MySQL has more varieties of data 
types.  MySQL is perhaps a better solution for very quickly showing static 
data or storing information which is not precious enough to require 
COMMIT/ROLLBACK.  The MySQL documents on-line describe and contrast MySQL 
pretty well although since the docs are large, you may have to scroll 
through a lot to find them.

Ideologically, although the MySQL license is liberal and I think you can 
get some source (I think they GPL older versions), PostgreSQL is more 
"open", some would say the only open solution of the two.

===

Subject: [HACKERS] The New Slashdot Setup (includes MySql server)
From: Alessio Bragadini <alessio@albourne.com>
Date: Thu, 18 May 2000 19:33:43 +0300

Info on the new slashdot.org setup

<http://slashdot.org/article.pl?sid=00/05/18/1427203&mode=nocomment>

interesting because of the plans (meaning $$$) they have to improve
MySql, and because they are the flagship MySql site/application. 

In the comment page, replying to the usual "Why not PostgreSql?" thread
someone pointed out an extract from the MySql docs that seems to me
blatantly false
(http://slashdot.org/comments.pl?sid=00/05/18/1427203&cid=131).

===

Subject: Re: [HACKERS] The New Slashdot Setup (includes MySql server)
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Thu, 18 May 2000 13:12:38 -0400 (EDT)


> Info on the new slashdot.org setup
> 
> <http://slashdot.org/article.pl?sid=00/05/18/1427203&mode=nocomment>
> 
> interesting because of the plans (meaning $$$) they have to improve
> MySql, and because they are the flagship MySql site/application. 
> 
> In the comment page, replying to the usual "Why not PostgreSql?" thread
> someone pointed out an extract from the MySql docs that seems to me
> blatantly false
> (http://slashdot.org/comments.pl?sid=00/05/18/1427203&cid=131).

Just finished reading the thread.  I am surprised how many people
slammed them on their MySQL over PostgreSQL decision.  People are
slamming MySQL all over the place.  :-)

Seems like inertia was the reason to stay with MySQL.  What that means
to me is that for their application space, PostgreSQL already has
superior technology, and people realize it.  This means we are on our
way up, and MySQL is, well, ....


===

Subject: Re: [HACKERS] The New Slashdot Setup (includes MySql server)
From: Benjamin Adida <ben@mit.edu>
Date: Thu, 18 May 2000 13:42:13 -0400

on 5/18/00 1:12 PM, Bruce Momjian at pgman@candle.pha.pa.us wrote:

> Seems like inertia was the reason to stay with MySQL.  What that means
> to me is that for their application space, PostgreSQL already has
> superior technology, and people realize it.  This means we are on our
> way up, and MySQL is, well, ....

There is this growing desire among some OpenACS people to replicate the
Slashdot functionality in an OpenACS module (probably a weekend's worth of
work). I wish I had a bit more free time to do it. It's time to show what
can be done with a real RDBMS (and a real web application environment, but
that's a different story).

-Ben

===

Subject: Re: [HACKERS] The New Slashdot Setup (includes MySql server)
From: The Hermit Hacker <scrappy@hub.org>
Date: Thu, 18 May 2000 15:08:56 -0300 (ADT)

On Thu, 18 May 2000, Bruce Momjian wrote:

> > Info on the new slashdot.org setup
> > 
> > <http://slashdot.org/article.pl?sid=00/05/18/1427203&mode=nocomment>
> > 
> > interesting because of the plans (meaning $$$) they have to improve
> > MySql, and because they are the flagship MySql site/application. 
> > 
> > In the comment page, replying to the usual "Why not PostgreSql?" thread
> > someone pointed out an extract from the MySql docs that seems to me
> > blatantly false
> > (http://slashdot.org/comments.pl?sid=00/05/18/1427203&cid=131).
> 
> Just finished reading the thread.  I am surprised how many people
> slammed them on their MySQL over PostgreSQL decision.  People are
> slamming MySQL all over the place.  :-)
> 
> Seems like inertia was the reason to stay with MySQL.  What that means
> to me is that for their application space, PostgreSQL already has
> superior technology, and people realize it.  This means we are on our
> way up, and MySQL is, well, ....

In SlashDot's defence here ... I dooubt there is much they do that would
require half of what we offer ... it *very* little INSERT/UPDATE/DELETE
and *alot* of SELECT ...


===

Subject: Re: [HACKERS] The New Slashdot Setup (includes MySql server)
From: Alfred Perlstein <bright@wintelcom.net>
Date: Thu, 18 May 2000 12:16:19 -0700

The Hermit Hacker <scrappy@hub.org> [000518 11:51] wrote:
> On Thu, 18 May 2000, Bruce Momjian wrote:
> 
> > > Info on the new slashdot.org setup
> > > 
> > > <http://slashdot.org/article.pl?sid=00/05/18/1427203&mode=nocomment>
> > > 
> > > interesting because of the plans (meaning $$$) they have to improve
> > > MySql, and because they are the flagship MySql site/application. 
> > > 
> > > In the comment page, replying to the usual "Why not PostgreSql?" thread
> > > someone pointed out an extract from the MySql docs that seems to me
> > > blatantly false
> > > (http://slashdot.org/comments.pl?sid=00/05/18/1427203&cid=131).
> > 
> > Just finished reading the thread.  I am surprised how many people
> > slammed them on their MySQL over PostgreSQL decision.  People are
> > slamming MySQL all over the place.  :-)
> > 
> > Seems like inertia was the reason to stay with MySQL.  What that means
> > to me is that for their application space, PostgreSQL already has
> > superior technology, and people realize it.  This means we are on our
> > way up, and MySQL is, well, ....
> 
> In SlashDot's defence here ... I dooubt there is much they do that would
> require half of what we offer ... it *very* little INSERT/UPDATE/DELETE
> and *alot* of SELECT ...

If those guys still are doing multiple selects for each page view after
at least 2 years of being around and choking on the load, they seriously
need to get a clue.  mod_perl... belch!

===

Subject: Re: [HACKERS] The New Slashdot Setup (includes MySql server)
From: The Hermit Hacker <scrappy@hub.org>
Date: Thu, 18 May 2000 15:54:50 -0300 (ADT)

okay, that is a good point ... I know what the difference in performance
the two vs one select issue can produce ...

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql 
From: Hannu Krosing <hannu@tm.ee>
Date: Fri, 19 May 2000 11:00:29 +0300

The Hermit Hacker wrote:
> 
> thanks for the pointer ... I just posted my response ... specifically
> pointing out how "accurate" the MySQL docs tend to be *rofl*

And now there is a response to your response stating the following

> 
> The MySQL people have said exactly the same sort of things about
> the PostgreSQL people. So please stop the name-calling and
> the quotes around "test", it's not going to get you anywhere. 
> 
> That being said, the standard MySQL benchmark _still_ is 30 times
> faster for MySQL 3.23 than on PostgreSQL 7.0 (with fsync turned off,
> _and_ nonstandard speed-up PostgreSQL features like VACUUM enabled,

btw, how does one "enable" vacuum ?

> I might add). The main reason seems to be some sort of failure to 
> use the index in the SELECT and UPDATE test loops on the part of 
> PostgreSQL. 
> 
> The benchmark, for the curious, works like this: 
> 
> First it creates a table with an index: 
> 
> create table bench1 (id int NOT NULL,id2 int NOT NULL,id3 int NOT NULL,dummy1 char(30)); create unique
> index bench1_index_ on bench1 using btree (id,id2); create index bench1_index_1 on bench1 using btree (id3); 
> 
> Then it fills the table with 300.000 entries with unique id values. 
> 
> Then, it issues a query like this: 
> 
>         update bench1 set dummy1='updated' where id=1747 
> 
> which causes the backend to do one thousand read() calls. For each query. 

could it be that just updating 1 unique index causes 1k read()'s ?

> No wonder it's slow. An EXPLAIN query states that it's using the 
> index, though. I have no clue what happens here. I've sent this
> to the pgsql-general mailing list and have just reposted it to -hackers. 

I somehow missed it (on -hackers at least) so I repost it here

> Oh yes, the benchmark also revealed that CREATE TABLE in PostgreSQL 7.0 
> leaks about 2k of memory.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Fri, 19 May 2000 11:14:24 +0200


Hi,

Hannu Krosing:
> And now there is a response to your response stating the following
> 
That response is from me, actually. (I subscribed to -hackers two hours
ago, so I'm sorry if I missed anything.)

> > That being said, the standard MySQL benchmark _still_ is 30 times
> > faster for MySQL 3.23 than on PostgreSQL 7.0 (with fsync turned off,
> > _and_ nonstandard speed-up PostgreSQL features like VACUUM enabled,
> 
> btw, how does one "enable" vacuum ?

run-all-tests ... --fast.

The code has stuff like

	  $server->vacuum(1,\$dbh) if $opt_fast and defined $server->{vacuum};

sprinkled at strategic places.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql 
From: Chris <chris@bitmead.com>
Date: Sat, 20 May 2000 05:54:20 +1000

That being said, the standard MySQL benchmark _still_ is 30 times
> > > faster for MySQL 3.23 than on PostgreSQL 7.0 (with fsync turned off,
> > > _and_ nonstandard speed-up PostgreSQL features like VACUUM enabled,

VACUUM is not a speed-up feature, it's a slow-down feature. It reclaims
space and that takes time. It does update system statistics which can
help performance if done after a data load or perhaps once a day.

But "sprinkling the code" with vacuum sounds like a big performance
killer. Hope you are not counting vacuum as part of your 1000 read()
calls.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Fri, 19 May 2000 12:18:03 +0200

Hi,

Chris:
> VACUUM is not a speed-up feature, it's a slow-down feature. It reclaims
> space and that takes time. It does update system statistics which can
> help performance if done after a data load or perhaps once a day.
> 
OK, thanks for the clarification.

> But "sprinkling the code" with vacuum sounds like a big performance
> killer. Hope you are not counting vacuum as part of your 1000 read()
> calls.
> 
Nonono, the 1000 read() calls are triggered by a simple INSERT or UPDATE
call. They actually scan the pg_index table of the benchmark database. 

Why they do that is another question entirely. (a) these tables should
have indices, and (b) whatever postgres wants to know should have been
cached someplace. Oh yes, (c) what's in pg_index that needs to be 4
MBytes big?

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql 
From: Chris <chris@bitmead.com>
Date: Sat, 20 May 2000 06:34:06 +1000

Matthias Urlichs wrote:

> Nonono, the 1000 read() calls are triggered by a simple INSERT or UPDATE
> call. They actually scan the pg_index table of the benchmark database.

Does this only happen on the first call to INSERT/UPDATE after
connecting to the database, or does it happen with all subsequent calls
too?

===

Subject: RE: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
Date: Fri, 19 May 2000 19:51:38 +0900

Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Matthias Urlichs
> 
> Why they do that is another question entirely. (a) these tables should
> have indices, and (b) whatever postgres wants to know should have been
> cached someplace. Oh yes, (c) what's in pg_index that needs to be 4
> MBytes big?
>

What does 'vacuum pg_index' show ?


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Fri, 19 May 2000 12:39:40 +0200


Hi,

Chris:
> Matthias Urlichs wrote:
> 
> > Nonono, the 1000 read() calls are triggered by a simple INSERT or UPDATE
> > call. They actually scan the pg_index table of the benchmark database.
> 
> Does this only happen on the first call to INSERT/UPDATE after
> connecting to the database, or does it happen with all subsequent calls
> too?
> 
All of them. Whatever the server is looking up here, it's _not_ cached.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql 
From: Hiroshi Inoue <Inoue@tpf.co.jp>
Date: Fri, 19 May 2000 20:12:00 +0900

Matthias Urlichs wrote:

> Hi,
>
> Chris:
> > Matthias Urlichs wrote:
> >
> > > Nonono, the 1000 read() calls are triggered by a simple INSERT or UPDATE
> > > call. They actually scan the pg_index table of the benchmark database.
> >
> > Does this only happen on the first call to INSERT/UPDATE after
> > connecting to the database, or does it happen with all subsequent calls
> > too?
> >
> All of them. Whatever the server is looking up here, it's _not_ cached.
>

Maybe shared buffer isn't so large as to keep all the(4.1M) pg_index pages.
So it would read pages from disk every time,
Unfortunately pg_index has no index to scan the index entries of a relation now.

However why is pg_index so large ?

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Fri, 19 May 2000 13:40:08 +0200

Hi,

Hiroshi Inoue:
> What does 'vacuum pg_index' show ?
> 
test=> vacuum pg_index;
NOTICE:  Skipping "pg_index" --- only table owner can VACUUM it
VACUUM

OK, so I suppose I should do it as the postgres user...
test=> vacuum pg_index;
VACUUM

The debug output says:
DEBUG:  --Relation pg_index--
DEBUG:  Pages 448: Changed 0, reaped 448, Empty 0, New 0; Tup 34: Vac 21443, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 164, MaxLen 164;
Re-using: Free/Avail. Space 3574948/3567176; EndEmpty/Avail. Pages 0/447. CPU 0.46s/0.00u sec.
DEBUG:  Index pg_index_indexrelid_index: Pages 86; Tuples 34: Deleted 21443. CPU 0.05s/0.36u sec.
DEBUG:  Rel pg_index: Pages: 448 --> 1; Tuple(s) moved: 2. CPU 0.03s/0.03u sec.
DEBUG:  Index pg_index_indexrelid_index: Pages 86; Tuples 34: Deleted 2. CPU 0.01s/0.00u sec.

... which helped. A lot.

Thanks, everybody. The first quick benchmark run I did afterwards states
that PostgreSQL is now only half as fast as MySQL, instead of the factor
of 30 seen previously, on the MySQL benchmark test.  ;-)

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Fri, 19 May 2000 14:04:24 +0200


Hi,

Hiroshi Inoue:
> 
> Maybe shared buffer isn't so large as to keep all the(4.1M) pg_index pages.

That seems to be the case.

> So it would read pages from disk every time, Unfortunately pg_index
> has no index to scan the index entries of a relation now.
> 
Well, it's reasonable that you can't keep an index on the table which
states what the indices are. ;-)

... on the other hand, Apple's HFS file system stores all the information
about the on-disk locations of their files as a B-Tree in, in, you
guessed it, a B-Tree which is saved on disk as an (invisible) file.
Thus, the thing stores the information on where its sectors are located
at, inside itself.
To escape this catch-22 situation, the location of the first three
extents (which is usually all it takes anyway) is stored elsewhere.

Possibly, something like this would work with postgres too.

> However why is pg_index so large ?
> 
Creating ten thousand tables will do that to you.

Is there an option I can set to increase the appropriate cache, so that
the backend can keep the data in memory?

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes
From: The Hermit Hacker <scrappy@hub.org>
Date: Fri, 19 May 2000 09:42:30 -0300 (ADT)

On Fri, 19 May 2000, Matthias Urlichs wrote:

> Hi,
> 
> Hiroshi Inoue:
> > What does 'vacuum pg_index' show ?
> > 
> test=> vacuum pg_index;
> NOTICE:  Skipping "pg_index" --- only table owner can VACUUM it
> VACUUM
> 
> OK, so I suppose I should do it as the postgres user...
> test=> vacuum pg_index;
> VACUUM
> 
> The debug output says:
> DEBUG:  --Relation pg_index--
> DEBUG:  Pages 448: Changed 0, reaped 448, Empty 0, New 0; Tup 34: Vac 21443, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 164, MaxLen 164;
> Re-using: Free/Avail. Space 3574948/3567176; EndEmpty/Avail. Pages 0/447. CPU 0.46s/0.00u sec.
> DEBUG:  Index pg_index_indexrelid_index: Pages 86; Tuples 34: Deleted 21443. CPU 0.05s/0.36u sec.
> DEBUG:  Rel pg_index: Pages: 448 --> 1; Tuple(s) moved: 2. CPU 0.03s/0.03u sec.
> DEBUG:  Index pg_index_indexrelid_index: Pages 86; Tuples 34: Deleted 2. CPU 0.01s/0.00u sec.
> 
> ... which helped. A lot.
> 
> Thanks, everybody. The first quick benchmark run I did afterwards states
> that PostgreSQL is now only half as fast as MySQL, instead of the factor
> of 30 seen previously, on the MySQL benchmark test.  ;-)

Wow, shock of shocks ... MySQL has more inaccuracies in their docs? *grin*


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Mitch Vincent" <mitch@huntsvilleal.com>
Date: Fri, 19 May 2000 09:47:39 -0400

which helped. A lot.
>
> Thanks, everybody. The first quick benchmark run I did afterwards states
> that PostgreSQL is now only half as fast as MySQL, instead of the factor
> of 30 seen previously, on the MySQL benchmark test.  ;-)

while (horse == DEAD) {

beat();

}

... Anyway.. I can see this being true (the MySQL being twice as fast as
PostgreSQL) however I don't think that MySQL being faster than PostgreSQL
was ever up for debate. When you take a RDBMS and strip out a huge amount of
features, of course you're going to get a faster end product. It's just not
nearly as safe, feature rich or easy to work with (from a programmers
standpoint).

I looked at MySQL to use for my applications, for all of ten seconds.... To
code in and around, MySQL just isn't a useable RDBMS for me and I can hardly
see how it's useful for anyone doing the kind of programming I do..

What it is very good for is something like RADIUS/POP3  authentication, I
use it at my ISP to keep all my user authentication in one place... However
the only thing I catred about was speed there, and there are all of two
things I ever do to that database. I SELECT (once every auth request) and
occasionally I INSERT and possibly UPDATE, that coupled with the fact that
there are only two to three things in the database per user (username,
password and domain for POP3 auth) -- it's just not a very complicated thing
to do... I use a SQL backend because it's very easy to maintain and I can
easily write software to manipulate the data held in the tables -- that's
all.

With the other applications I and my company write, it's a totally different
story. I just don't see how a person can write any kind of a larger
application and not need all the features MySQL lacks...

I like MySQL for certain things -- however I've never considered  "MySQL vs
PostgreSQL" -- they're just two totally different databases for totally
different uses IMHO.

-Mitch

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Fri, 19 May 2000 16:00:14 +0200

Hi,

The Hermit Hacker:
> > Thanks, everybody. The first quick benchmark run I did afterwards states
> > that PostgreSQL is now only half as fast as MySQL, instead of the factor
> > of 30 seen previously, on the MySQL benchmark test.  ;-)
> 
> Wow, shock of shocks ... MySQL has more inaccuracies in their docs? *grin*

No, that factor of 30 was my result after running the benchmark for the
first time. Presumably, unless I skip the large_number_of_tables test,
it'll be just as slow the second time around.

The MySQL people probably didn't dig deeper into PostgreSQL's innards.
They don't seem to think it's their job to find out exactly why their
benchmark runs so slow on some other databases, and I don't particularly
fault them for that attitude.


The PostgreSQL community has an attitude too, after all.

One of these might be to answer "you must have had fsync turned on"
whenever somebody reports a way-too-slow benchmark.  In this case,
that's definitely not true.


Another attitude of the PostgreSQL developers might be to answer "run
VACUUM" whenever somebody reports performance problems. That answer is
not helpful at all WRT this benchmark, because the user who caused the
problem ("test", in my case) isn't permitted to run VACUUM on the
pg_index table.

The alternate solution would be for the backend to notice "Gee, I just
scanned a whole heap of what turned out to be empty space in this here
pg_index file, maybe it would be a good idea call vacuum() on it."

Or, if that doesn't work, increase the buffer for holding its content.


Anyway, I fully expect to have a more reasonable benchmark result by
tomorrow, and the MySQL guys will get a documentation update. Which they
_will_ put in the next update's documentation file. Trust me.  ;-)

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql 
From: Chris <chris@bitmead.com>
Date: Sat, 20 May 2000 10:19:00 +1000

Matthias Urlichs wrote:

> Another attitude of the PostgreSQL developers might be to answer "run
> VACUUM" whenever somebody reports performance problems. That answer is
> not helpful at all WRT this benchmark, because the user who caused the
> problem ("test", in my case) isn't permitted to run VACUUM on the
> pg_index table.

Speaking of which, why can't any user who can change meta-data, also
Vacuum meta-data ? It's not a threat to security is it?

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup
From: "Michael A. Olson" <mao@sleepycat.com>
Date: Fri, 19 May 2000 07:18:31 -0700

At 02:04 PM 5/19/00 +0200, you wrote:

> Well, it's reasonable that you can't keep an index on the table which
> states what the indices are. ;-)
> 
> ... on the other hand, Apple's HFS file system stores all the information
> about the on-disk locations of their files as a B-Tree in, in, you
> guessed it, a B-Tree which is saved on disk as an (invisible) file.
> Thus, the thing stores the information on where its sectors are located
> at, inside itself.
> To escape this catch-22 situation, the location of the first three
> extents (which is usually all it takes anyway) is stored elsewhere.
> 
> Possibly, something like this would work with postgres too.

This is one of several things we did at Illustra to make the backend
run faster.  I did the design and implementation, but it was a few
years ago, so the details are hazy.  Here's what I remember.

We had to solve three problems:

First, you had to be able to run initdb and bootstrap the system
without the index on pg_index in place.  As I recall, we had to
carefully order the creation of the first several tables to make
that work, but it wasn't rocket science.

Second, when the index on pg_index gets created, you need to update
it with tuples that describe it.  This is really just the same as
hard-coding the pg_attribute attribute entries into pg_attribute --
ugly, but not that bad.

Third, we had to abstract a lot of the hard-coded table scans in
the bowels of the system to call a routine that checked for the
existence of an index on the system table, and used it.  In order
for the index on pg_index to get used, its reldesc had to be nailed
in the cache.  Getting it there at startup was more hard-coded
ugliness, but you only had do to it one time.

The advantage is that you can then index a bunch more of the system
catalog tables, and on a bunch more attributes.  That produced some
surprising speedups.

This was simple enough that I'm certain the same technique would
work in the current engine.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 10:24:02 -0400

Matthias Urlichs" <smurf@noris.net> writes:
>>>> Nonono, the 1000 read() calls are triggered by a simple INSERT or UPDATE
>>>> call. They actually scan the pg_index table of the benchmark database.

Ohh ... pg_index is the culprit!  OK, I know exactly where that's coming
from: the planner is looking around to see what indexes might be
interesting for planning the query.  Several comments here:

1. Probably we ought to try to bypass most of the planning process for
a simple INSERT ... VALUES.  (I thought I had fixed that, but apparently
it's not getting short-circuited soon enough, if the search for indexes
is still happening.)

2. The search is not using either an index or a cache IIRC.  Needs to
be fixed but there may be no suitable index present in 7.0.

3. I have been toying with the notion of having relcache entries store
information about the indexes associated with the table, so that the
planner wouldn't have to search through pg_index at all.  The trouble
with doing that is getting the info updated when an index is added or
dropped; haven't quite figured out how to do that...

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup  (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 19 May 2000 11:25:49 -0400 (EDT)


> The advantage is that you can then index a bunch more of the system
> catalog tables, and on a bunch more attributes.  That produced some
> surprising speedups.

We have indexes on all system tables that need it.  The pg_index index
was done quite easily and is new for 7.0.  A check for recursion and
fallback to sequential scan for pg_index table rows in the pg_index
table allows it to happen.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 12:39:17 -0400

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> The advantage is that you can then index a bunch more of the system
>> catalog tables, and on a bunch more attributes.  That produced some
>> surprising speedups.

> We have indexes on all system tables that need it.

There isn't any fundamental reason why the planner can't be using an
index to scan pg_index; we just need to code it that way.  Right now
it's coded as a sequential scan.

Unfortunately there is no index on pg_index's indrelid column in 7.0,
so this is not fixable without an initdb.  TODO item for 7.1, I guess.

More generally, someone should examine the other places where
heap_getnext() loops occur, and see if any of them look like performance
bottlenecks...
===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 19 May 2000 12:54:10 -0400 (EDT)


> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> The advantage is that you can then index a bunch more of the system
> >> catalog tables, and on a bunch more attributes.  That produced some
> >> surprising speedups.
> 
> > We have indexes on all system tables that need it.
> 
> There isn't any fundamental reason why the planner can't be using an
> index to scan pg_index; we just need to code it that way.  Right now
> it's coded as a sequential scan.
> 
> Unfortunately there is no index on pg_index's indrelid column in 7.0,
> so this is not fixable without an initdb.  TODO item for 7.1, I guess.

The reason there is no index is because it is not a unique column, and
at the time I was adding system indexes for 7.0, I was looking for
indexes that could be used for system cache lookups.  The index you are
describing returns multiple tuples, so it would be an actual index call
in the code.  i will add this to the TODO.


> 
> More generally, someone should examine the other places where
> heap_getnext() loops occur, and see if any of them look like performance
> bottlenecks...

Good idea.  Added to TODO.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup
From: "Michael A. Olson" <mao@sleepycat.com>
Date: Fri, 19 May 2000 09:54:11 -0700

At 12:39 PM 5/19/00 -0400, Tom Lane wrote:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > We have indexes on all system tables that need it.
> 
> There isn't any fundamental reason why the planner can't be using an
> index to scan pg_index; we just need to code it that way.  Right now
> it's coded as a sequential scan.

Eliminating the hard-coded seqscans of catalogs in the bowels of the
system was the hardest part of the project.  As I said, it was good
to do.  It made parsing and planning queries much, much faster.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup  (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 19 May 2000 13:14:30 -0400 (EDT)

At 12:39 PM 5/19/00 -0400, Tom Lane wrote:
> 
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > 
> > > We have indexes on all system tables that need it.
> > 
> > There isn't any fundamental reason why the planner can't be using an
> > index to scan pg_index; we just need to code it that way.  Right now
> > it's coded as a sequential scan.
> 
> Eliminating the hard-coded seqscans of catalogs in the bowels of the
> system was the hardest part of the project.  As I said, it was good
> to do.  It made parsing and planning queries much, much faster.

All the sequential catalog scans that return one row are gone.  What has
not been done is adding indexes for scans returning more than one row.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 13:36:29 -0400

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> All the sequential catalog scans that return one row are gone.  What has
> not been done is adding indexes for scans returning more than one row.

I've occasionally wondered whether we can't find a way to use the
catcaches for searches that can return multiple rows.  It'd be easy
enough to add an API for catcache that could return multiple rows given
a nonunique search key.  The problem is how to keep the catcache up to
date with underlying reality for this kind of query.  Deletions of rows
will be handled by the existing catcache invalidation mechanism, but
how can we know when some other backend has added a row that will match
a search condition?  Haven't seen an answer short of scanning the table
every time, which makes the catcache no win at all.


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 19 May 2000 13:56:58 -0400 (EDT)


> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > All the sequential catalog scans that return one row are gone.  What has
> > not been done is adding indexes for scans returning more than one row.
> 
> I've occasionally wondered whether we can't find a way to use the
> catcaches for searches that can return multiple rows.  It'd be easy
> enough to add an API for catcache that could return multiple rows given
> a nonunique search key.  The problem is how to keep the catcache up to
> date with underlying reality for this kind of query.  Deletions of rows
> will be handled by the existing catcache invalidation mechanism, but
> how can we know when some other backend has added a row that will match
> a search condition?  Haven't seen an answer short of scanning the table
> every time, which makes the catcache no win at all.

Good point.  You can invalidate stuff, but how to find new stuff that
doesn't have a specific key?


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 15:35:04 -0400

Michael A. Olson" <mao@sleepycat.com> writes:
> Third, we had to abstract a lot of the hard-coded table scans in
> the bowels of the system to call a routine that checked for the
> existence of an index on the system table, and used it.

The way that we've been approaching this is by switching from hard-coded
sequential scans (heap_getnext() calls) to hard-coded indexscans
(index_getnext() calls) at places where performance dictates it.

An advantage of doing it that way is that you don't have the
bootstrapping/circularity problems that Mike describes; the code doesn't
need to consult pg_index to know whether there is an index to use, it
just has the necessary info hard-coded in.  For the same reason it's
very quick.

Nonetheless it's also a pretty ugly answer.  I'd rather the code wasn't
so tightly tied to a particular set of indexes for system tables.

I was thinking about doing something like what Mike describes: replace
uses of heap_beginscan() with calls to a routine that would examine the
passed ScanKey(s) to see if there is a relevant index, and then start
either a heap or index scan as appropriate.  The circularity issue could
be resolved by having that routine have hard-coded knowledge of some of
the system-table indexes (or even all of them, which is still better
than having that knowledge scattered throughout the code).  But the
performance cost of identifying the right index based on ScanKeys gives
me pause.  It's hard to justify that per-search overhead when the
hard-coded approach works well enough.

Thoughts anyone?

			regards, tom lane
===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 19 May 2000 15:52:44 -0400 (EDT)

Michael A. Olson" <mao@sleepycat.com> writes:
> > Third, we had to abstract a lot of the hard-coded table scans in
> > the bowels of the system to call a routine that checked for the
> > existence of an index on the system table, and used it.
> 
> The way that we've been approaching this is by switching from hard-coded
> sequential scans (heap_getnext() calls) to hard-coded indexscans
> (index_getnext() calls) at places where performance dictates it.
> 
> An advantage of doing it that way is that you don't have the
> bootstrapping/circularity problems that Mike describes; the code doesn't
> need to consult pg_index to know whether there is an index to use, it
> just has the necessary info hard-coded in.  For the same reason it's
> very quick.

I like hard-coded.  There aren't many of them, last time I looked. 
Maybe 5-10 that need index scan.  The rest are already done using the
catalog cache.

===

Subject: RE: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
Date: Sat, 20 May 2000 07:01:15 +0900

Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Tom Lane
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> The advantage is that you can then index a bunch more of the system
> >> catalog tables, and on a bunch more attributes.  That produced some
> >> surprising speedups.
> 
> > We have indexes on all system tables that need it.
> 
> There isn't any fundamental reason why the planner can't be using an
> index to scan pg_index; we just need to code it that way.  Right now
> it's coded as a sequential scan.
> 
> Unfortunately there is no index on pg_index's indrelid column in 7.0,
> so this is not fixable without an initdb.  TODO item for 7.1, I guess.
>

I've noticed the fact since before but haven't complained.
As far as I see,pg_index won't so big. In fact Matthias's case has
only 1 page after running vacuum for pg_index.  In such cases
sequential scan is faster than index scan as you know.
I don't agree with you to increase system indexes easily.
Though I implemented REINDEX command to recover system
indexes it doesn't mean index corruption is welcome.

I know another case. pg_attrdef has no index on (adrelid,attnum)
though it has an index on (adrelid).

> More generally, someone should examine the other places where
> heap_getnext() loops occur, and see if any of them look like performance
> bottlenecks...

Please don't lose sequential scan stuff even when changes to
index scan is needed because -P option of standalone postgres
needs sequential scan for system tables.

===


Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 18:41:32 -0400

Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> Unfortunately there is no index on pg_index's indrelid column in 7.0,
>> so this is not fixable without an initdb.  TODO item for 7.1, I guess.

> I've noticed the fact since before but haven't complained.
> As far as I see,pg_index won't so big. In fact Matthias's case has
> only 1 page after running vacuum for pg_index.  In such cases
> sequential scan is faster than index scan as you know.

True, but the differential isn't very big either when dealing with
a small table.  I think I'd rather use an index and be assured that
performance doesn't degrade drastically when the database contains
many indexes.

I've also been thinking about ways to implement the relcache-based
caching of index information that I mentioned before.  That doesn't
address the scanning problem in general but it should improve
performance for this part of the planner quite a bit.  The trick is to
ensure that other backends update their cached info whenever an index
is added or deleted.  I thought of one way to do that: force an update
of the owning relation's pg_class tuple during CREATE or DROP INDEX,
even when we don't have any actual change to make in its contents ---
that'd force a relcache invalidate cycle at other backends.  (Maybe
we don't even need to change the pg_class tuple, but just send out a
shared-cache-invalidate message as if we had.)

> I know another case. pg_attrdef has no index on (adrelid,attnum)
> though it has an index on (adrelid).

Doesn't look to me like we need an index on (adrelid,attnum), at
least not in any paths that are common enough to justify maintaining
another index.  The (adrelid) index supports loading attrdef data
into the relcache, which is the only path I'm particularly concerned
about performance of...

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 18:23:26 -0400

Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> More generally, someone should examine the other places where
>> heap_getnext() loops occur, and see if any of them look like performance
>> bottlenecks...

> Please don't lose sequential scan stuff even when changes to
> index scan is needed because -P option of standalone postgres
> needs sequential scan for system tables.

Good point.  I'd still like not to clutter the code with deciding
which kind of scan to invoke, though.  Maybe we could put the
begin_xxx routine in charge of ignoring a request for an indexscan
when -P is used.  (AFAIR there's no real difference for the calling
code, it sets up scankeys and so forth just the same either way, no?
We should just need a switching layer in front of heap_beginscan/
index_beginscan and heap_getnext/index_getnext...)

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 18:44:27 -0400

I wrote:
> We should just need a switching layer in front of heap_beginscan/
> index_beginscan and heap_getnext/index_getnext...)

After refreshing my memory of how these are used, it seems that
we'd have to change the API of either the heap or index scan routines
in order to unify them like that.  Might be worth doing to maintain
code cleanliness, though.  The places Hiroshi has fixed to support
both index and seq scan look really ugly to my eyes ...

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 19 May 2000 19:21:47 -0400 (EDT)


> I've noticed the fact since before but haven't complained.
> As far as I see,pg_index won't so big. In fact Matthias's case has
> only 1 page after running vacuum for pg_index.  In such cases
> sequential scan is faster than index scan as you know.
> I don't agree with you to increase system indexes easily.
> Though I implemented REINDEX command to recover system
> indexes it doesn't mean index corruption is welcome.
> 
> I know another case. pg_attrdef has no index on (adrelid,attnum)
> though it has an index on (adrelid).
> 
> > More generally, someone should examine the other places where
> > heap_getnext() loops occur, and see if any of them look like performance
> > bottlenecks...
> 
> Please don't lose sequential scan stuff even when changes to
> index scan is needed because -P option of standalone postgres
> needs sequential scan for system tables.

Certainly whatever we do will be discussed.  I realize initdb is an
issue.  However, I am not sure sequential scan is faster than index scan
for finding only a few rows in the table.


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 19 May 2000 19:24:13 -0400 (EDT)


> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> Unfortunately there is no index on pg_index's indrelid column in 7.0,
> >> so this is not fixable without an initdb.  TODO item for 7.1, I guess.
> 
> > I've noticed the fact since before but haven't complained.
> > As far as I see,pg_index won't so big. In fact Matthias's case has
> > only 1 page after running vacuum for pg_index.  In such cases
> > sequential scan is faster than index scan as you know.
> 
> True, but the differential isn't very big either when dealing with
> a small table.  I think I'd rather use an index and be assured that
> performance doesn't degrade drastically when the database contains
> many indexes.

Agreed.

> 
> I've also been thinking about ways to implement the relcache-based
> caching of index information that I mentioned before.  That doesn't
> address the scanning problem in general but it should improve
> performance for this part of the planner quite a bit.  The trick is to
> ensure that other backends update their cached info whenever an index
> is added or deleted.  I thought of one way to do that: force an update
> of the owning relation's pg_class tuple during CREATE or DROP INDEX,
> even when we don't have any actual change to make in its contents ---
> that'd force a relcache invalidate cycle at other backends.  (Maybe
> we don't even need to change the pg_class tuple, but just send out a
> shared-cache-invalidate message as if we had.)

Oh, good idea.  Just invalidate the relation so we reload.

BTW, Hiroshi is the one who gave me the recursion-prevision fix for the
system index additions for 7.0.


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 20:32:18 -0400

I wrote:
>>>>> Nonono, the 1000 read() calls are triggered by a simple INSERT or UPDATE
>>>>> call. They actually scan the pg_index table of the benchmark database.
>
> Ohh ... pg_index is the culprit!  OK, I know exactly where that's coming
> from: the planner is looking around to see what indexes might be
> interesting for planning the query.  Several comments here:
>
> 1. Probably we ought to try to bypass most of the planning process for
> a simple INSERT ... VALUES.  (I thought I had fixed that, but apparently
> it's not getting short-circuited soon enough, if the search for indexes
> is still happening.)


It never pays to assume you know what's happening without having looked
:-(.  It turns out the planner is not the only culprit: the executor's
ExecOpenIndices() routine *also* does a sequential scan of pg_index.
I did shortcircuit the planner's search in the INSERT ... VALUES case,
but of course the executor still must find out whether the table has
indexes.

In UPDATE, DELETE, or INSERT ... SELECT, pg_index is scanned *twice*,
once in the planner and once in the executor.  (In fact it's worse
than that: the planner scans pg_index separately for each table named
in the query.  At least the executor only does it once since it only
has to worry about one target relation.)

Definitely need to cache the indexing information...

			regards, tom lane
Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 19 May 2000 20:32:18 -0400

I wrote:
>>>>> Nonono, the 1000 read() calls are triggered by a simple INSERT or UPDATE
>>>>> call. They actually scan the pg_index table of the benchmark database.
>
> Ohh ... pg_index is the culprit!  OK, I know exactly where that's coming
> from: the planner is looking around to see what indexes might be
> interesting for planning the query.  Several comments here:
>
> 1. Probably we ought to try to bypass most of the planning process for
> a simple INSERT ... VALUES.  (I thought I had fixed that, but apparently
> it's not getting short-circuited soon enough, if the search for indexes
> is still happening.)


It never pays to assume you know what's happening without having looked
:-(.  It turns out the planner is not the only culprit: the executor's
ExecOpenIndices() routine *also* does a sequential scan of pg_index.
I did shortcircuit the planner's search in the INSERT ... VALUES case,
but of course the executor still must find out whether the table has
indexes.

In UPDATE, DELETE, or INSERT ... SELECT, pg_index is scanned *twice*,
once in the planner and once in the executor.  (In fact it's worse
than that: the planner scans pg_index separately for each table named
in the query.  At least the executor only does it once since it only
has to worry about one target relation.)

Definitely need to cache the indexing information...

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 19 May 2000 21:46:43 -0400 (EDT)


> I wrote:
> > We should just need a switching layer in front of heap_beginscan/
> > index_beginscan and heap_getnext/index_getnext...)
> 
> After refreshing my memory of how these are used, it seems that
> we'd have to change the API of either the heap or index scan routines
> in order to unify them like that.  Might be worth doing to maintain
> code cleanliness, though.  The places Hiroshi has fixed to support
> both index and seq scan look really ugly to my eyes ...

Agreed, and I think there are a few places that have them.


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql 
From: Thomas Lockhart <lockhart@alumni.caltech.edu>
Date: Sat, 20 May 2000 01:53:28 +0000

The MySQL people probably didn't dig deeper into PostgreSQL's innards.
> They don't seem to think it's their job to find out exactly why their
> benchmark runs so slow on some other databases, and I don't particularly
> fault them for that attitude.

Hmm. And then who's job is it to take someone else's work and make it
accurate? If the shoe were on the other foot: if I generated a
benchmark suite and features list, and it contained major and numerous
inaccuracies, who would you expect to be responsible (or at least feel
responsible) for correcting/updating/improving it? 'Twould be me imho.

We've tried, and failed (to date) to contribute information to the
"crashme" travesty. My recollection was a ~30% error rate on
information for Postgres, and I didn't look into the stats for other
databases. Check the archives for details.

> The PostgreSQL community has an attitude too, after all.

Yup ;)

> One of these might be to answer "you must have had fsync turned on"
> whenever somebody reports a way-too-slow benchmark.  In this case,
> that's definitely not true.

I'm sorry that has been your experience. imho, that initial response
might be considered "helpful advice", not "attitude". And I'll submit
that most postings I've seen (I'm mostly on the -hackers list) are
followed up to the bitter end if the poster can state the problem
succinctly and can follow up with specific information. But I'm a
developer, so don't have the right outlook.

> Anyway, I fully expect to have a more reasonable benchmark result by
> tomorrow, and the MySQL guys will get a documentation update. Which they
> _will_ put in the next update's documentation file. Trust me.  ;-)

Fantastic! We've been down this road before, and have had little luck
in getting more than a token update of inaccuracies. Any little bit
helps.

And while you're at it, can you update their docs and web site to make
clear that transactions and atomicity are not anywhere near the
feature list of MySQL yet? TIA

                     - Thomas

===

Subject: RE: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
Date: Sat, 20 May 2000 14:25:23 +0900

Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> I wrote:
> > We should just need a switching layer in front of heap_beginscan/
> > index_beginscan and heap_getnext/index_getnext...)
> 
> After refreshing my memory of how these are used, it seems that
> we'd have to change the API of either the heap or index scan routines
> in order to unify them like that.  Might be worth doing to maintain
> code cleanliness, though.  The places Hiroshi has fixed to support
> both index and seq scan look really ugly to my eyes ...
>

Yes,it's ugly unfortunately.  So I had hesitated to commit it for
pretty long. There's a trial of unification in my trial implementation
of ALTER TABLE DROP COLUMN in command.c.
 
===

Subject: RE: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
Date: Sat, 20 May 2000 14:25:26 +0900

Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> Unfortunately there is no index on pg_index's indrelid column in 7.0,
> >> so this is not fixable without an initdb.  TODO item for 7.1, I guess.
> 
> I've also been thinking about ways to implement the relcache-based
> caching of index information that I mentioned before.  That doesn't
> address the scanning problem in general but it should improve
> performance for this part of the planner quite a bit.

Sounds reasonble to me.

> The trick is to
> ensure that other backends update their cached info whenever an index
> is added or deleted.  I thought of one way to do that: force an update
> of the owning relation's pg_class tuple during CREATE or DROP INDEX,
> even when we don't have any actual change to make in its contents ---
> that'd force a relcache invalidate cycle at other backends.  (Maybe
> we don't even need to change the pg_class tuple, but just send out a
> shared-cache-invalidate message as if we had.)
>

Seems CREATE INDEX already sends shared_cache_invalidate
message. DROP INDEX doesn't ?? I'm not sure.
 
> > I know another case. pg_attrdef has no index on (adrelid,attnum)
> > though it has an index on (adrelid).
> 
> Doesn't look to me like we need an index on (adrelid,attnum), at
> least not in any paths that are common enough to justify maintaining
> another index.  The (adrelid) index supports loading attrdef data
> into the relcache, which is the only path I'm particularly concerned
> about performance of...
>

It seems to me that an index on (adrelid,adnum) should
exist instead of the current index. It identifies pg_attrdef.
I say *Oops* about it in my trial implementation of ALTER
TABLE DROP COLUMN.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 20 May 2000 01:30:08 -0400

Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>>>> I know another case. pg_attrdef has no index on (adrelid,attnum)
>>>> though it has an index on (adrelid).
>> 
>> Doesn't look to me like we need an index on (adrelid,attnum), at
>> least not in any paths that are common enough to justify maintaining
>> another index.  The (adrelid) index supports loading attrdef data
>> into the relcache, which is the only path I'm particularly concerned
>> about performance of...

> It seems to me that an index on (adrelid,adnum) should
> exist instead of the current index. It identifies pg_attrdef.
> I say *Oops* about it in my trial implementation of ALTER
> TABLE DROP COLUMN.

Right, I saw that.  But it seems to be the only place where such an
index would be useful.  The relcache-loading routines, which seem to
be the only performance-critical access to pg_attrdef, prefer an index
on adrelid only.  Is it worth maintaining a 2-column index (which is
bulkier and slower than a 1-column one) just to speed up ALTER TABLE
DROP COLUMN?

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sat, 20 May 2000 10:09:00 +0200

Hi,

Tom Lane:
> It never pays to assume you know what's happening without having looked
> :-(.  It turns out the planner is not the only culprit: the executor's
> ExecOpenIndices() routine *also* does a sequential scan of pg_index.

That meshes with my observation that updates seem to do twice as many
read() calls on pg_index than inserts.

For this test, anyway.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql 
From: Mike Mascari <mascarm@mascari.com>
Date: Sat, 20 May 2000 07:02:42 -0400

Matthias Urlichs wrote:
> Attached is the current crashme output. "crash_me_safe" is off only
> because of the fact that some tests go beyond available memory.
> There's no sense in testing how far you can push a 
> "SELECT a from b where c = 'xxx(several megabytes worth of Xes)'"
> query when the size fo a TEXT field is limited to 32k.
> 
> Limits with '+' in front of the number say that this is the max value
> tested, without implying whether higher values are OK or not.
> 
> If you have any remarks, especially about the '=no' results (i.e. you
> think PostgreSQL can do that, therefore the crashme test must be wrong
> somehow), tell me. Otherwise I'll forward the results to the MySQL
> people next week.

How about:

1. alter_rename_table = no

The syntax in PostgreSQL is ALTER TABLE x RENAME TO y;

2. atomic_updates = no

Huh? Besides being paranoid about fsync()'ing transactions how is
a transaction based MVCC not atomic with respect to updates?

3. automatic_rowid = no

The description simply says Automatic rowid. Does this apply to
query result sets or to the underlying relation? If the latter,
PostgreSQL has, of course, an OID for every tuple in the
database.

4. binary_items = no

Read up on large objects...

5. connections = 32

This, should, of course be +32, since PostgreSQL can easily
handle hundreds of simultaneous connections.

6. create_table_select = no

Again. PostgreSQL supports CREATE TABLE AS SELECT (i.e. Oracle),
and SELECT INTO syntax.

7. except = no

PostgreSQL has had both INTERSECT and EXCEPT since 6.5.0 (albeit
they're slow).

I'm starting to get very tired of this. I don't see why
PostgreSQL users are obligated to get MySQL tests correct. And
I'm only 15% through the list...

Bottom line...either the test writers are ignorant or deceptive.
Either way I won't trust my data with them...

===

Subject: RE: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
Date: Sun, 21 May 2000 01:28:31 +0900

Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >>>> I know another case. pg_attrdef has no index on (adrelid,attnum)
> >>>> though it has an index on (adrelid).
> >> 
> >> Doesn't look to me like we need an index on (adrelid,attnum), at
> >> least not in any paths that are common enough to justify maintaining
> >> another index.  The (adrelid) index supports loading attrdef data
> >> into the relcache, which is the only path I'm particularly concerned
> >> about performance of...
> 
> > It seems to me that an index on (adrelid,adnum) should
> > exist instead of the current index. It identifies pg_attrdef.
> > I say *Oops* about it in my trial implementation of ALTER
> > TABLE DROP COLUMN.
> 
> Right, I saw that.  But it seems to be the only place where such an
> index would be useful.  The relcache-loading routines, which seem to
> be the only performance-critical access to pg_attrdef, prefer an index
> on adrelid only.  Is it worth maintaining a 2-column index (which is
> bulkier and slower than a 1-column one) just to speed up ALTER TABLE
> DROP COLUMN?
>

I don't mind so much about the performance in this case.
The difference would be little.

Isn't it a fundamental principle to define primary(unique
identification) constraint for each table ?
I had never thought that the only one index of pg_attrdef 
isn't an unique identification index until I came across the
unexpcted result of my DROP COLUMN test case.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sat, 20 May 2000 20:17:10 +0200

Hi,

Mike Mascari:
> 
> 1. alter_rename_table = no
> 
> The syntax in PostgreSQL is ALTER TABLE x RENAME TO y;
> 
They say "alter table crash_q rename crash_q1".

What does the official standard say (assuming any exists) -- is the "to"
optional or not?

> 2. atomic_updates = no
> 
> Huh? Besides being paranoid about fsync()'ing transactions how is
> a transaction based MVCC not atomic with respect to updates?
> 
That's a misnomer. They actually mean this:

	create table crash_q (a integer not null);
	create unique index crf on crash_q(a);

	insert into crash_q values (2);
	insert into crash_q values (3);
	insert into crash_q values (1);
	update crash_q set a=a+1;

> 3. automatic_rowid = no
> 
> The description simply says Automatic rowid. Does this apply to
> query result sets or to the underlying relation? If the latter,
> PostgreSQL has, of course, an OID for every tuple in the
> database.
> 
I'll have them fix that. MySQL calls them "_rowid" and apparently tests
only for these.

> 4. binary_items = no
> 
> Read up on large objects...
> 
... with an ... erm ... let's call it "nonstandard" ... interface.

> 5. connections = 32
> 
> This, should, of course be +32, since PostgreSQL can easily
> handle hundreds of simultaneous connections.
> 
The testing code (Perl) looks like this, and it bombs after the 32nd
connection.

  for ($i=1; $i < $max_connections ; $i++)
  {
    if (!($dbh=DBI->connect($server->{'data_source'},$opt_user,$opt_password,
              { PrintError => 0})))
    {
      print "Last connect error: $DBI::errstr\n" if ($opt_debug);
      last;
    }
    $dbh->{LongReadLen}= $longreadlen; # Set retrieval buffer
    print "." if ($opt_debug);
    push(@connect,$dbh);
  }
  print "$i\n";

I do not know where that limit comes from.
It might be the DBI interface to PostgreSQL, or a runtime limit.

Anyway, $max_connections has the value to 1000.

> 6. create_table_select = no
> 
> Again. PostgreSQL supports CREATE TABLE AS SELECT (i.e. Oracle),
> and SELECT INTO syntax.

Test code:
	create table crash_q SELECT * from crash_me;

Again, is the "AS" optional or not?

> 7. except = no
> 
> PostgreSQL has had both INTERSECT and EXCEPT since 6.5.0 (albeit
> they're slow).
> 
Looking at the test, we see it doing this:

	create table crash_me (a integer not null,b char(10) not null);
	insert into crash_me (a,b) values (1,'a');
	create table crash_me2 (a integer not null,b char(10) not null, c integer);
	insert into crash_me2 (a,b,c) values (1,'b',1);
	select * from crash_me except select * from crash_me2;

For what it's worth, there is at least one database which doesn't
have this restriction (i.e., that the number of columns must be
identical) (namely SOLID).

So this test needs to be split into two. I'll do that.

> I'm starting to get very tired of this. I don't see why
> PostgreSQL users are obligated to get MySQL tests correct. And
> I'm only 15% through the list...
> 
_Somebody_ has to get these things right. I'm not suggesting that it's
any obligation of yours specifically, but somebody's gotta do it, and
(IMHO) it can only be done by somebody who already knows _something_
about the databse to be tested.

> Bottom line...either the test writers are ignorant or deceptive.

Or the tests are just badly written. Or they're too old and suffer from
severe bit rot.


For what its worth, I do NOT think the people who wrote these tests
are either ignorant or deceptive. Most, if not all, of these tests
are OK when checked against at least one SQLish database.

===
Subject: Re: [HACKERS] More Performance
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sat, 20 May 2000 20:54:20 +0200

Hi,

I've found another one of these performance problems in the benchmark,
related to another ignored index.

The whole thing works perfectly after a VACUUM ANALYZE on the
table.

IMHO this is somewhat non-optimal. In the absence of information
to the contrary, PostgreSQL should default to using an index if
it might be appropriate, not ignore it.

I am thus throwing away yet another benchmark run -- the query now runs
300 times faster. *Sigh* 

test=# vacuum bench1;
VACUUM
test=# \d bench1
         Table "bench1"
 Attribute |   Type   | Modifier 
 -----------+----------+----------
 id        | integer  | not null
 id2       | integer  | not null
 id3       | integer  | not null
 dummy1    | char(30) | 
Indices: bench1_index_,
         bench1_index_1

test=# \d bench1_index_


Index "bench1_index_"
 Attribute |  Type   
  -----------+---------
 id        | integer
 id2       | integer
unique btree

test=# 
test=# 
test=# \d bench1_index_1
Index "bench1_index_1"
 Attribute |  Type   
   -----------+---------
 id3       | integer
btree

test=# explain update bench1 set dummy1='updated' where id=150;
NOTICE:  QUERY PLAN:

Seq Scan on bench1  (cost=0.00..6843.00 rows=3000 width=18)

EXPLAIN
test=# vacuum bench1;
VACUUM
test=# explain update bench1 set dummy1='updated' where id=150;
NOTICE:  QUERY PLAN:

Seq Scan on bench1  (cost=0.00..6843.00 rows=3000 width=18)

EXPLAIN
test=# select count(*) from bench1;
 count  
   --------
 300000
(1 row)

test=# select count(*) from bench1 where id = 150;
 count 
   -------
     1
(1 row)

test=# explain select count(*) from bench1 where id = 150;
NOTICE:  QUERY PLAN:

Aggregate  (cost=6850.50..6850.50 rows=1 width=4)
  ->  Seq Scan on bench1  (cost=0.00..6843.00 rows=3000 width=4)

EXPLAIN


***************************************************************

Related to this:

test=# explain select id from bench1 order by id;
NOTICE:  QUERY PLAN:

Sort  (cost=38259.21..38259.21 rows=300000 width=4)
  ->  Seq Scan on bench1  (cost=0.00..6093.00 rows=300000 width=4)

EXPLAIN

The basic idea to speed this one up (a lot...) would be to walk the index.

This is _after_ ANALYZE, of course.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Sat, 20 May 2000 14:56:36 -0400 (EDT)


I know I am going to regret believing that I will actually make any
difference, but I am going to shoot myself anyway.

I am writing this more for the new PostgreSQL members who were not
around last time than in any belief it will make a difference on the
MySQL end.



> Hi,
> 
> Mike Mascari:
> > 
> > 1. alter_rename_table = no
> > 
> > The syntax in PostgreSQL is ALTER TABLE x RENAME TO y;
> > 
> They say "alter table crash_q rename crash_q1".
> 
> What does the official standard say (assuming any exists) -- is the "to"
> optional or not?

I don't see any RENAME in the SQL92 spec.  Now, how hard is it to do a
'man alter_table' and see what it says at the top of the screen?

> 
> > 2. atomic_updates = no
> > 
> > Huh? Besides being paranoid about fsync()'ing transactions how is
> > a transaction based MVCC not atomic with respect to updates?
> > 
> That's a misnomer. They actually mean this:
> 
> 	create table crash_q (a integer not null);
> 	create unique index crf on crash_q(a);
> 
> 	insert into crash_q values (2);
> 	insert into crash_q values (3);
> 	insert into crash_q values (1);
> 	update crash_q set a=a+1;

Poorly named, huh?  How do you think it got such a name?  This item was
on the crashme tests before TRANSACTION was on there?  Can you explain
how a very exotic issue got on there year(s) before transactions. 
Transactions got on there only because I complained.

> 
> > 3. automatic_rowid = no
> > 
> > The description simply says Automatic rowid. Does this apply to
> > query result sets or to the underlying relation? If the latter,
> > PostgreSQL has, of course, an OID for every tuple in the
> > database.
> > 
> I'll have them fix that. MySQL calls them "_rowid" and apparently tests
> only for these.

Well, I don't see _rowid in the SQL spec either, so we are both
non-standard here, though I believe our OID is SQL3.


> 
> > 4. binary_items = no
> > 
> > Read up on large objects...
> > 
> ... with an ... erm ... let's call it "nonstandard" ... interface.

Yes.

> 
> > 5. connections = 32
> > 
> > This, should, of course be +32, since PostgreSQL can easily
> > handle hundreds of simultaneous connections.
> > 
> The testing code (Perl) looks like this, and it bombs after the 32nd
> connection.
> 
>   for ($i=1; $i < $max_connections ; $i++)
>   {
>     if (!($dbh=DBI->connect($server->{'data_source'},$opt_user,$opt_password,
>               { PrintError => 0})))
>     {
>       print "Last connect error: $DBI::errstr\n" if ($opt_debug);
>       last;
>     }
>     $dbh->{LongReadLen}= $longreadlen; # Set retrieval buffer
>     print "." if ($opt_debug);
>     push(@connect,$dbh);
>   }
>   print "$i\n";
> 
> I do not know where that limit comes from.
> It might be the DBI interface to PostgreSQL, or a runtime limit.
> 
> Anyway, $max_connections has the value to 1000.

You have to recompile the backend to increase it.  Not on the client
end.  See FAQ.

> 
> > 6. create_table_select = no
> > 
> > Again. PostgreSQL supports CREATE TABLE AS SELECT (i.e. Oracle),
> > and SELECT INTO syntax.
> 
> Test code:
> 	create table crash_q SELECT * from crash_me;
> 
> Again, is the "AS" optional or not?

man create_table.  That is all it takes.  There is not standard for
this.  It is from Oracle.  Is their AS optional?  Does it really matter?

> 
> > 7. except = no
> > 
> > PostgreSQL has had both INTERSECT and EXCEPT since 6.5.0 (albeit
> > they're slow).
> > 
> Looking at the test, we see it doing this:
> 
> 	create table crash_me (a integer not null,b char(10) not null);
> 	insert into crash_me (a,b) values (1,'a');
> 	create table crash_me2 (a integer not null,b char(10) not null, c integer);
> 	insert into crash_me2 (a,b,c) values (1,'b',1);
> 	select * from crash_me except select * from crash_me2;
> 
> For what it's worth, there is at least one database which doesn't
> have this restriction (i.e., that the number of columns must be
> identical) (namely SOLID).
> 
> So this test needs to be split into two. I'll do that.

So you test EXCEPT by having a different number of columns.  I can see
it now, "Hey we don't have EXCEPT.  PostgreSQL does it, but they can't
handle a different number of columns.  Let's do only that test so we
look equal."

> 
> > I'm starting to get very tired of this. I don't see why
> > PostgreSQL users are obligated to get MySQL tests correct. And
> > I'm only 15% through the list...
> > 
> _Somebody_ has to get these things right. I'm not suggesting that it's
> any obligation of yours specifically, but somebody's gotta do it, and
> (IMHO) it can only be done by somebody who already knows _something_
> about the database to be tested.
> 
> > Bottom line...either the test writers are ignorant or deceptive.
> 
> Or the tests are just badly written. Or they're too old and suffer from
> severe bit rot.
> 
> 
> For what its worth, I do NOT think the people who wrote these tests
> are either ignorant or deceptive. Most, if not all, of these tests
> are OK when checked against at least one SQLish database.

In looking at each of these items, it is impossible for me to believe
that the tests were not written by either very ignorant people ("I can't
run 'man') or very deceptive people ("Let's make ourselves look good.").

If you view this from outside the MySQL crowd, can you see how we would
feel this way?  This is just a small example of the volumes of reasons
we have in believing this.

If you are going to publish things about other databases on your web
site, you had better do a reasonable job to see that is it accurate and
fair.  If it can't be done, take it off.  Don't leave it up and have it
be wrong, and ignore people in the past who tell you it is wrong.

It never has been fair, and I suspect never will be, because this is
hashed around every year with little change or acknowledgement.

So, yea, we have an attitude.  We are usually nice folks, so if the
majority of us have a bad attitude, there must be some basis for that
feeling, and I can tell you, the majority of us do have a bad attitude
on the topic.

I know the MySQL folks don't have a bad attitude about us, and you know,
they don't because we never did anything like this Crashme to them.  But
actually, we are tired of being pushed by an ignorant/deceptive crashme
test, and we are starting to push back.   But, you can be sure we will
never stoop to the level of the crashme test.


===

Subject: Re: [HACKERS] More Performance
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Sat, 20 May 2000 15:17:56 -0400 (EDT)


> Hi,
> 
> I've found another one of these performance problems in the benchmark,
> related to another ignored index.
> 
> The whole thing works perfectly after a VACUUM ANALYZE on the
> table.
> 
> IMHO this is somewhat non-optimal. In the absence of information
> to the contrary, PostgreSQL should default to using an index if
> it might be appropriate, not ignore it.

This is an interesting idea.  So you are saying that if a column has no
vacuum analyze statistics, assume it is unique?  Or are you talking
about a table that has never been vacuumed?  Then we assume it is a
large table.  Interesting.  It would help some queries, but hurt others.
We have gone around and around on what the default stats should be.
Tom Lane can comment on this better than I can.

> 
> Related to this:
> 
> test=# explain select id from bench1 order by id;
> NOTICE:  QUERY PLAN:
> 
> Sort  (cost=38259.21..38259.21 rows=300000 width=4)
>   ->  Seq Scan on bench1  (cost=0.00..6093.00 rows=300000 width=4)
> 
> EXPLAIN
> 
> The basic idea to speed this one up (a lot...) would be to walk the index.
> 
> This is _after_ ANALYZE, of course.

But you are grabbing the whole table.  Our indexes are separate files. 
The heap is unordered, meaning a sequential scan and order by is usually
faster than an index walk unless there is a restrictive WHERE clause.

Thanks for the tip about needing an index on pg_index.  That will be in
7.1.  I remember previous crashme rounds did bring up some good info for
us, like the fact older releases couldn't handle trailing comments from
perl.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sat, 20 May 2000 22:26:40 +0200


Hi,

[ Sorry if this reply is much too long. I know that...]

Bruce Momjian:
> I know I am going to regret believing that I will actually make any
> difference, but I am going to shoot myself anyway.
> 
I sincerely hope/believe you're wrong.

> > What does the official standard say (assuming any exists) -- is the "to"
> > optional or not?
> 
> I don't see any RENAME in the SQL92 spec.  Now, how hard is it to do a
> 'man alter_table' and see what it says at the top of the screen?
> 
It's not a question of your manpage vs. their manpage. I can read your
manpage just fine. It's a question of whether there is somethign that
can be regarded as a standard on it or not. "Official" is a poor wording
in this case -- sorry.

If yes, then the test will be changed to do it the standard way.
If no, then I might have to test for both syntaxes, which is a PITA.


While I'm at it, I note that the last sentence of that manpage says

	The clauses to rename columns and tables are Postgres extensions
	from SQL92.

Correct me when I'm wrong, but is that really _your_ extension, or
did some other database vendor (who?) come up with it?

> > Anyway, $max_connections has the value to 1000.
> 
> You have to recompile the backend to increase it.  Not on the client
> end.  See FAQ.

I was compiling and running the backend with default options(*). That
means that the tests will show the default limits. It does this for all
the other databases in the crash-me test result suite. (Oracle:40,
mysql:100, solid:248, empress:10, interbase:10)

Anyway, the max value for PostgreSQL, without recompiling the backend,
is 1024 according to the FAQ; but there's no way an automated test can
find out _that_.

I'll add a comment ("installation default") to that test column.

(*) except for fsync, of course, in the interest of fairness.

> man create_table.  That is all it takes.  There is not standard for
> this.  It is from Oracle.  Is their AS optional?  Does it really matter?
> 
No.

What matters is that your opinion is that they are responsible for making
the test 100% accurate. Their reply to that is that many database
vendors actually provided fixes for this test instead of bitching
about how inaccurate it is, thus they feel the obligation is on your
side.

Now I am of neither side. I am, IMHO, thus in a position to ask you
about your opinion of these inaccuracies, I am going to change 
the crashme test to be a whole lot more accurate WRT PostgreSQL,
I will feed these changes back to the MySQL people, and they'll
incorporate these changes into their next release. (Their head honcho
(Monty) has said so on their mailing list. I _am_ going to take him up
on it, and I can be quite obnoxious if somebody reneges on a promise.
*EVIL*GRIN* )

If your opinion is that you have a right to be annoyed about all of this
because you went through the whole thing last year, and the year before
that, and ..., ... well, I can understand your point of view.

But I honestly think that the problem is not one of either malice or
stupidity. "Different sets of priorities" and "different project
structure" are equally-valid assumptions. At least for me. Until I'm
proven wrong (IF I am).

> So you test EXCEPT by having a different number of columns.  I can see
> it now, "Hey we don't have EXCEPT.  PostgreSQL does it, but they can't
> handle a different number of columns.  Let's do only that test so we
> look equal."
> 
They might as well have written that test while checking their crash-me
script against SOLID and noting a few features MySQL doesn't have yet.
Or they may have gotten it from them in the first place.


I might add that their test lists 52 features of PostgreSQL which
MySQL doesn't have (13 functions). It also lists 122 features of MySQL
which PostgreSQL doesn't have; 78 of those are extra functions (40 of
these, just for M$-ODBC compatibility).

So it seems that overall, that crash-me test result is reasonably
balanced (39 vs. 44 non-function differences -- let's face it, adding
another function for compatibility with SQL variant FOO is one of the
easier exercises here, whatever the current value of FOO is).

The result is going to be even more balanced when I'm through with it,
but I cannot do that on my own, as I do not have enough experience with
either PostgreSQL or the various SQL standards. Thus, I'm asking.

Is that really a problem?

> If you view this from outside the MySQL crowd, can you see how we would
> feel this way?  This is just a small example of the volumes of reasons
> we have in believing this.
> 
I would like not to view this from any outside, inside, or whatever
viewpoint. My goal is to get at least some part of the petty arguments
out of the way because, in MY book at least, the _real_ "battle", such
as there is, isn't PostgreSQL against MySQL! It's more-or-less-
-open-source databases on one side and closed-source products, some of
which are the equivalent of MS Word in the database world (you know who
I'm talkign about ;-) on the other side.

> It never has been fair, and I suspect never will be, because this is
> hashed around every year with little change or acknowledgement.
> 
It is about as fair as a certain comparison chart on your site has been.
It's gone now, thus as far as I'm concerned it's water under the bridge.
Besides, I'm not interested. Some of the members of this list seem
to be pretty much burned out on the whole issue -- I can live with that;
but I'm trying to do something about the problem. Don't shoot the
messenger. ;-)


===

Subject: MySQL's "crashme" (was Re: [HACKERS] Performance)
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 20 May 2000 16:26:03 -0400

aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <14160.958852518.1@sss.pgh.pa.us>

"Matthias Urlichs" <smurf@noris.net> writes:
>> Hmm. And then who's job is it to take someone else's work and make it
>> accurate? If the shoe were on the other foot: if I generated a
>> benchmark suite and features list, and it contained major and numerous
>> inaccuracies, who would you expect to be responsible (or at least feel
>> responsible) for correcting/updating/improving it? 'Twould be me imho.
>> 
> Umm, there's still a difference between saying (a) "it's broken, fix
> it", (b) "here's my analysis as to what exactly is broken, can you fix
> it", and (c) "here's a patch that fixes it".

Good luck.  Close analysis of the crashme test leaves an extremely bad
taste in the mouth: there are just too many cases where it's clearly
designed as a pro-MySQL advertising tool and not an honest attempt to
describe reality.  Shall we consider details?

> Attached is the current crashme output. "crash_me_safe" is off only
> because of the fact that some tests go beyond available memory.
> There's no sense in testing how far you can push a "SELECT a from b where
> c = 'xxx(several megabytes worth of Xes)'" query when the size fo a TEXT
> field is limited to 32k.

I would not like to see us labeled "crashme unsafe" merely because
someone is too impatient to let the test run to conclusion.  But there's
a more interesting problem here: using stock crashme and Postgres 7.0,
on my system it's crashme that crashes and not Postgres!  The crashme
Perl script is a huge memory hog and runs into the kernel's process-size
limit long before the connected backend does.  To get it to run to
completion, I have to reduce the thing's limit on the longest query it
will try:

*** crash-me~	Sat May 20 12:28:11 2000
 --- crash-me	Sat May 20 13:21:11 2000
***************
*** 104,110 ****
  #
  
  $max_connections="+1000";       # Number of simultaneous connections
! $max_buffer_size="+16000000";   # size of communication buffer.
  $max_string_size="+8000000";    # Enough for this test
  $max_name_length="+512";        # Actually 256, but ...
  $max_keys="+64";                # Probably too big.
 --- 104,110 ----
  #
  
  $max_connections="+1000";       # Number of simultaneous connections
! $max_buffer_size="+1000000";    # size of communication buffer.
  $max_string_size="+8000000";    # Enough for this test
  $max_name_length="+512";        # Actually 256, but ...
  $max_keys="+64";                # Probably too big.

A few months ago I was able to use max_buffer_size = +2000000, but
crashme 1.43 seems to be an even worse memory hog than its predecessors.
At this setting, the Perl process tops out at about 114Mb while the
connected backend grows to 66Mb.  (I run with a process limit of 128Mb.)
To be fair, this could be Perl's fault more than crashme's.  I'm using
Perl 5.005_03 ... anyone know if more recent versions use less memory?


Now, on to some specific complaints:

> alter_drop_col=no			# Alter table drop column

While our ALTER TABLE support is certainly pretty weak, it should be
noted that this test will continue to fail even when we have ALTER TABLE
DROP COLUMN, because crashme is testing for a non-SQL-compliant syntax.

> alter_rename_table=no			# Alter table rename table

We have ALTER TABLE RENAME ... but not under the syntax crashme is
testing.  Since SQL92 doesn't specify a syntax for RENAME, there's no
absolute authority for this --- but a quick check of the comparative
crashme results at http://www.mysql.com/crash-me-choose.htmy shows that
*none* of the major commercial DBMSs "pass" this test.  Rather curious
that crashme uses a MySQL-only syntax for this test, no?

> atomic_updates=no			# atomic updates

What's actually being tested here is whether the DBMS will let you do
"update crash_q set a=a+1" in a table with a unique index on "a" and
consecutive pre-existing values.  In other words, is the uniqueness
constraint checked on a per-tuple-update basis, or deferred to end of
transaction?  It's fair to blame Postgres for not supporting a deferred
uniqueness check, but this test is extremely misleadingly labeled.
A person who hadn't examined the guts of crashme would probably think
it tests whether concurrent transactions see each others' results
atomically.

> automatic_rowid=no			# Automatic rowid

Test is actually looking for a system column named "_rowid".  Our OIDs
serve the same purpose, and I believe there are equivalent features in
many other DBMSes.  Again, MySQL is the only "passer" of this test,
which says more about their level of standardization than other
people's.

> binary_items=no				# binary items (0x41)

We have binary literals (per the test name) and hex literals (what
it actually appears to be testing).  Unfortunately for us, ours are
SQL92-compliant syntax, and what crashme is looking for isn't.

> comment_#=no				# # as comment
> comment_--=yes			# -- as comment
> comment_/**/=yes			# /* */ as comment
> comment_//=no				# // as comment

It'd be helpful to the reader if they indicated which two of these
conventions are SQL-compliant ... of course, that might expose the
fact that MySQL isn't ...

> connections=32				# Simultaneous connections

Should probably be noted that this is just the default limit (chosen to
avoid creating problems on small systems) and can easily be raised at
postmaster start time.

> crash_me_safe=no			# crash me safe

I get "yes", and I'd *really* appreciate it if you not submit this
misleading statement.

> create_table_select=no			# create table from select

This is looking for "create table crash_q SELECT * from crash_me",
which again appears to be a MySQL-only syntax.  We have the same feature
but we want "AS" in front of the "SELECT".  Dunno how other DBMSs do it.

> date_zero=no				# Supports 0000-00-00 dates

Note this is not checking to see if the date format yyyy-mm-dd is
accepted, it's checking to see if the specific value '0000-00-00'
is accepted.  Haven't these people heard of NULL?  Another test that
only MySQL "passes".

> except=no				# except

This test is checking:
create table crash_me (a integer not null,b char(10) not null);
create table crash_me2 (a integer not null,b char(10) not null, c integer);
select * from crash_me except select * from crash_me2;
Postgres rejects it with
ERROR:  Each UNION | EXCEPT | INTERSECT query must have the same number of columns.
Unsurprisingly, hardly anyone else accepts it either.

> except_all=no				# except all

While we do not have "except all", when we do this test will still fail
for the same reason as above.

> func_extra_not=no			# Function NOT in SELECT

What they are looking for here is "SELECT NOT 0", which Postgres rejects
as a type violation.  SQL-compliant "NOT FALSE" would work.

BTW, while I haven't got the patience to go through the function list in
detail, quite a few functions that we actually have are shown as "not
there" because of type resolution issues.  For example they test exp()
with "select exp(1)" which fails because of ambiguity about whether
exp(float8) or exp(numeric) is wanted.  This will get cleaned up soon,
but it's not really a big problem in practice...

> having_with_alias=no			# Having on alias

Again, how curious that MySQL is the only DBMS shown as passing this
test.  Couldn't be because it violates SQL92, could it?

> insert_select=no			# insert INTO ... SELECT ...

We would pass this test if the crashme script weren't buggy: it fails
to clean up after a prior test that creates a crash_q table with
different column names.  The prior test is testing "drop table if
exists", which means the only way to be shown as having this
SQL-standard feature is to implement the not-standard "if exists".

> intersect=no				# intersect
> intersect_all=no			# intersect all

See above comments for EXCEPT.

> logical_value=1			# Value of logical operation (1=1)

A rather odd result, considering that what Postgres actually returns for
"SELECT (1=1)" is 't'.  But showing the correct answer isn't one of
crashme's highest priorities...

> minus_neg=no				# Calculate 1--1

Another case where "passing" the test means accepting MySQL's version of
reality instead of SQL92's.  All the SQL-compliant DBMSs think -- is a
comment introducer, so "select a--1 from crash_me" produces an error ...
but not in MySQL ...

> quote_ident_with_"=no			# " as identifier quote (ANSI SQL)
> quote_ident_with_[=no			# [] as identifier quote
> quote_ident_with_`=no			# ` as identifier quote

Here at least they admit which variant is ANSI ;-).  Postgres doesn't
pass because we think 'select "A" from crash_me' should look for a
column named upper-case-A, but the column is actually named
lower-case-a.  We are not conforming to the letter of the SQL standard
here --- SQL says an unquoted name should be mapped to all upper case,
not all lower case as we do it, which is how the column got to be named
that way.  We're closer than MySQL though...

> select_string_size=+16208		# constant string size in SELECT

I got 1048567 here, roughly corresponding to where I set max_buffer_size.
Not sure why you get a smaller answer.

> select_table_update=no			# Update with sub select

We certainly have update with sub select.  What they're looking for is
the non-SQL-compliant syntax
	update crash_q set crash_q.b=
		(select b from crash_me where crash_q.a = crash_me.a);
It works in Postgres if you remove the illegal table specification:
	update crash_q set b=
		(select b from crash_me where crash_q.a = crash_me.a);

> type_sql_bit=yes			# Type bit
> type_sql_bit(1_arg)=yes			# Type bit(1 arg)
> type_sql_bit_varying(1_arg)=yes		# Type bit varying(1 arg)

It should probably be noted that we only have syntax-level support for
BIT types in 7.0; they don't actually work.  The test is not deep enough
to notice that, however.


General comments:

It appears that they've cleaned up their act a little bit.  The last
time I examined crashme in any detail, there was an even longer list
of tests that checked for standard features but were careful to use a
nonstandard variant so they could claim that other people failed to
have the feature at all.

More generally, it's difficult to take seriously a test method and
presentation method that puts more weight on how many variant spellings
of "log()" you accept than on whether you have subselects.  (I count
five entries versus two.)

One could also complain about the very large number of tests that are
checking features that are non-SQL if not downright SQL-contradictory,
but are listed simply as bullet points with no pro or con.  A naive
reader would think that green stars are always good; they are not,
but how are you to tell without a copy of the SQL spec in hand?

Finally, the test coverage seems to have been designed with an eye
towards giving MySQL as many green stars as possible, not towards
exercising the most important features of SQL.  It would be interesting
to see considerably more coverage of subselects, for example, and I
expect that'd turn up shortcomings in a number of products including
Postgres.  But it won't happen as long as crashme is a tool of, by, and
for MySQL partisans (at least not till MySQL has subselects, whereupon
the test coverage will no doubt change).


Just FYI, I attach a diff between what you presented and what I get from
running the current crashme.  I don't understand exactly what's causing
the small differences in the values of some of the size limits.
Perhaps it is a side effect of using a different max_buffer_size, but
it seems really weird.

			regards, tom lane


 ------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <14160.958852518.2@sss.pgh.pa.us>
Content-Description: Pg.cfg.diff

37c37
< crash_me_safe=no			# crash me safe
 ---
> crash_me_safe=yes			# crash me safe
309c309
< max_char_size=8104			# max char() size
 ---
> max_char_size=8088			# max char() size
315c315
< max_index_length=2704			# index length
 ---
> max_index_length=2700			# index length
317c317
< max_index_part_length=2704		# max index part length
 ---
> max_index_part_length=2700		# max index part length
319,321c319,321
< max_index_varchar_part_length=2704	# index varchar part length
< max_row_length=7949			# max table row length (without blobs)
< max_row_length_with_null=7949		# table row length with nulls (without blobs)
 ---
> max_index_varchar_part_length=2700	# index varchar part length
> max_row_length=7937			# max table row length (without blobs)
> max_row_length_with_null=7937		# table row length with nulls (without blobs)
326c326
< max_text_size=8104			# max text or blob size
 ---
> max_text_size=8092			# max text or blob size
328c328
< max_varchar_size=8104			# max varchar() size
 ---
> max_varchar_size=8088			# max varchar() size
344c344
< operating_system=Linux 2.3.99s-noris-pre9-2 i686	# crash-me tested on
 ---
> operating_system=HP-UX B.10.20 9000/780	# crash-me tested on
355c355
< query_size=16777216			# query size
 ---
> query_size=1048576			# query size
369c369
< select_string_size=+16208		# constant string size in SELECT
 ---
> select_string_size=1048567		# constant string size in SELECT
490c490
< where_string_size=+16208		# constant string size in where
 ---
> where_string_size=1048542		# constant string size in where

  ------- =_aaaaaaaaaa0--

===

Subject: Re: [HACKERS] More Performance
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sat, 20 May 2000 22:30:21 +0200


Hi,

Bruce Momjian:
> > 
> > test=# explain select id from bench1 order by id;
> > Sort  (cost=38259.21..38259.21 rows=300000 width=4)
> >   ->  Seq Scan on bench1  (cost=0.00..6093.00 rows=300000 width=4)
> > 
> The heap is unordered, meaning a sequential scan and order by is usually
> faster than an index walk unless there is a restrictive WHERE clause.
> 
What heap? The index is a b-tree in this case. Thus you should be able
to walk it and get the sorted result without ever touching the data
file.

Whether that makes sense with the current structure of the PostgreSQL
backend is a different question, of course. Certain othr databases
(no, not just MySQL ;-) are capable of doing that optimization, however.

==

Subject: Re: [HACKERS] More Performance
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sat, 20 May 2000 22:40:43 +0200

Hi,

Bruce Momjian:
> > IMHO this is somewhat non-optimal. In the absence of information
> > to the contrary, PostgreSQL should default to using an index if
> > it might be appropriate, not ignore it.
> 
> This is an interesting idea.  So you are saying that if a column has no
> vacuum analyze statistics, assume it is unique?

Nope. But why should vacuum analyze be the one and only part of
PostgreSQL where statistics are ever updated?

When you have no statistics, a "column_name=CONSTANT" query for an
indexed column yields exactly one result (actually, "significantly fewer
results than there are 8-kbyte records in the table" would do), you
might want to record the fact that using the index might, in hindsight,
have been a good idea after all.

Then, when the next query like that comes in, you use the index.

Maybe I'm too naive ;-)  but I fail to see how this approach could
be either hard to implement or detrimental to performance.

===

Subject: Re: [HACKERS] More Performance
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Sat, 20 May 2000 16:43:57 -0400 (EDT)


> Hi,
> 
> Bruce Momjian:
> > > 
> > > test=# explain select id from bench1 order by id;
> > > Sort  (cost=38259.21..38259.21 rows=300000 width=4)
> > >   ->  Seq Scan on bench1  (cost=0.00..6093.00 rows=300000 width=4)
> > > 
> > The heap is unordered, meaning a sequential scan and order by is usually
> > faster than an index walk unless there is a restrictive WHERE clause.
> > 
> What heap? The index is a b-tree in this case. Thus you should be able
> to walk it and get the sorted result without ever touching the data
> file.
> 
> Whether that makes sense with the current structure of the PostgreSQL
> backend is a different question, of course. Certain othr databases
> (no, not just MySQL ;-) are capable of doing that optimization, however.

We can't read data from the index.  It would be nice if we could, but we
can't.  I think we believe that there are very few cases where this
would be win.  Usually you need non-indexed data too.


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 20 May 2000 17:44:45 -0400

Matthias Urlichs" <smurf@noris.net> writes:
>> So you test EXCEPT by having a different number of columns.  I can see
>> it now, "Hey we don't have EXCEPT.  PostgreSQL does it, but they can't
>> handle a different number of columns.  Let's do only that test so we
>> look equal."
>> 
> They might as well have written that test while checking their crash-me
> script against SOLID and noting a few features MySQL doesn't have yet.
> Or they may have gotten it from them in the first place.

Our gripe is not that they're testing an extension we haven't got.
It's that the test result is misleadingly labeled.  It doesn't say
"EXCEPT with incompatible select lists", it says "EXCEPT", full stop.
That's deceptive.  And no, we do not think it's an honest mistake.
It's part of a consistent pattern of misstatements that's been going on
for a long time.  Sure, any one might be an honest mistake, but when you
see the same sort of thing over and over again, your credulity drops to
a low level.  crashme is designed to make MySQL look good and everyone
else (not just Postgres) look bad.

I'm glad to hear your optimism about cleaning this up.  Perhaps you
can actually accomplish something, but most of us decided long ago
that crashme is not meant as a fair comparison.  We have other things
to do than ride herd on crashme and try to keep them to the straight
and narrow, when they clearly have no desire to make it an unbiased
test and will not do so without constant prodding.

			regards, tom lane
===

Subject: Re: [HACKERS] More Performance 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 20 May 2000 21:13:49 -0400

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> What heap? The index is a b-tree in this case. Thus you should be able
>> to walk it and get the sorted result without ever touching the data
>> file.

> We can't read data from the index.  It would be nice if we could, but we
> can't.

The reason we can't is that we don't store tuple validity data in
indexes.  The index entry has the key value and a pointer to the tuple
in the main heap file, but we have to visit the tuple to find out
whether it's committed or dead.  If we did otherwise, then committing or
killing tuples would be lots slower than it is, because we'd have to
find and mark all the index entries pointing at the tuple, not just the
tuple itself.  It's a tradeoff... but we think it's a good one.

> I think we believe that there are very few cases where this
> would be win.  Usually you need non-indexed data too.

Right, non-toy examples usually read additional data columns anyway.

			regards, tom lane
===

Subject: Re: [HACKERS] More Performance
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Sat, 20 May 2000 21:59:04 -0400 (EDT)


> > We can't read data from the index.  It would be nice if we could, but we
> > can't.  I think we believe that there are very few cases where this
> > would be win.  Usually you need non-indexed data too.
> 
> I have used other databases where this _is_ possible in the past, and
> the win is big when the programmer codes for it.  Sure, most cases don't
> just use indexed data, but if the programmer knows that the database
> supports index-only scans then sometimes an extreme performance
> requirement can be met.
> 

Yes, totally true.  It is an extreme optimization.  In Ingres, you could
actually SELECT on the index and use that when needed.


===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Sat, 20 May 2000 21:57:50 -0400 (EDT)


> Our gripe is not that they're testing an extension we haven't got.
> It's that the test result is misleadingly labeled.  It doesn't say
> "EXCEPT with incompatible select lists", it says "EXCEPT", full stop.
> That's deceptive.  And no, we do not think it's an honest mistake.
> It's part of a consistent pattern of misstatements that's been going on
> for a long time.  Sure, any one might be an honest mistake, but when you
> see the same sort of thing over and over again, your credulity drops to
> a low level.  crashme is designed to make MySQL look good and everyone
> else (not just Postgres) look bad.
> 
> I'm glad to hear your optimism about cleaning this up.  Perhaps you
> can actually accomplish something, but most of us decided long ago
> that crashme is not meant as a fair comparison.  We have other things
> to do than ride herd on crashme and try to keep them to the straight
> and narrow, when they clearly have no desire to make it an unbiased
> test and will not do so without constant prodding.

The basic issue is that you can tell us that this big crashme mess
happened by mistake, and that there there was no deceptive intent.

However, really, we are not stupid enough to believe it.

Why don't you find out who wrote this thing, and ask them what they were
thinking when they wrote it?  I bet you will find out our perception is
correct. 

===

Subject: [HACKERS] MySQL crashme test and PostgreSQL
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Sat, 20 May 2000 23:13:46 -0400 (EDT)


[CC: to general list.]

> > > What does the official standard say (assuming any exists) -- is the "to"
> > > optional or not?
> > 
> > I don't see any RENAME in the SQL92 spec.  Now, how hard is it to do a
> > 'man alter_table' and see what it says at the top of the screen?
> > 
> It's not a question of your manpage vs. their manpage. I can read your
> manpage just fine. It's a question of whether there is something that
> can be regarded as a standard on it or not. "Official" is a poor wording
> in this case -- sorry.
> 
> If yes, then the test will be changed to do it the standard way.
> If no, then I might have to test for both syntaxes, which is a PITA.
> 

You know, you are asking what syntax is SQL standard.  It is actually
not our job to report it to you.  If you are responsible for the test,
you should know what the standard says, and test against that.  If you
are not responsible for the test, then it shows that the person who is
responsible for the test doesn't care enough to test for SQL standard
syntax, only for MySQL syntax.

You know, there is a saying, "Do it right, or don't do it at all."  That
is pretty much the PostgreSQL style.  And if you are going to criticize
someone, you better be sure you are right.

We didn't write the crashme test, we don't host it on our web site, we
didn't ask to be in it.  Someone has to be responsible for the test, and
knowing standard SQL syntax, and that must be whoever put it on the
MySQL site.  We really don't want to hear that it dropped from the sky
and landed on the MySQL site, and no one there is responsible for it.

If we put something on our site, we are responsible for it.  If we don't
like something or can't take ownership of it, we remove it.

Now, I am not picking on you.  You may have the best of intentions.  But
basically someone has decided to put it on the MySQL site, and has not
considered it worth their while to learn the SQL standard.  They would
rather make other people tell them about the SQL standard, and maybe,
just maybe, we will fix the test someday.  Well, I will tell you, we
have better things to do than fix the MySQL crashme test.

> What matters is that your opinion is that they are responsible for making
> the test 100% accurate. Their reply to that is that many database
> vendors actually provided fixes for this test instead of bitching
> about how inaccurate it is, thus they feel the obligation is on your
> side.

BINGO!  You know, if other database vendors are stupid enough to do
MySQL's work for them and read the SQL standard for them, well...

You can't just point fingers and say no one at MySQL is responsible.
The MySQL bias is written all through that test.

> Now I am of neither side. I am, IMHO, thus in a position to ask you
> about your opinion of these inaccuracies, I am going to change 
> the crashme test to be a whole lot more accurate WRT PostgreSQL,
> I will feed these changes back to the MySQL people, and they'll
> incorporate these changes into their next release. (Their head honcho
> (Monty) has said so on their mailing list. I _am_ going to take him up
> on it, and I can be quite obnoxious if somebody reneges on a promise.
> *EVIL*GRIN* )

You know, how do we know he is not just saying that hoping no one will
actually take him up on it.

You know, Monty was on this list last year, and he asked why we had a
bad attitude about MySQL, and we told him about the crashme test, and
you know, nothing happened.   So I don't think it is very important to
Monty to be fair, or more accurately, he would rather keep a test that
makes MySQL look good, than to spend time making the test fair.  He made
his choice.  I can tell you our reaction would be totally different.

> I might add that their test lists 52 features of PostgreSQL which
> MySQL doesn't have (13 functions). It also lists 122 features of MySQL
> which PostgreSQL doesn't have; 78 of those are extra functions (40 of
> these, just for M$-ODBC compatibility).


> 
> So it seems that overall, that crash-me test result is reasonably
> balanced (39 vs. 44 non-function differences -- let's face it, adding
> another function for compatibility with SQL variant FOO is one of the
> easier exercises here, whatever the current value of FOO is).

You have to make the test deceptive to get MySQL to be on par with
PostgreSQL.  Period.  Doesn't MySQL admit they have fewer features than
PostgreSQL.  How did MySQL get an equal score on features?  Answer me
that one.

We have given enough of our time to this, and have pointed out many
problems.  Why don't you go an get those fixed, to show that the MySQL
group is working in good faith on this, and then, go and get a copy of
the standard, or a book about standard SQL, and start actually doing
something about the test.  

And if it is not worth your time, and it is not worth any one else's
time at MySQL, then you folks have to admit you want to criticize
PostgreSQL without spending time to be fair about it.

I am going to suggest that no one else in the PostgreSQL group send any
more problem reports about the crashme tests until some changes appear
on the MySQL end.  Tom Lane has already done a great job of illustrating
the issues involved.  Pointing to actual SQL items is not the real
problem.  The MySQL attitude about crashme is the problem.

Also, I have heard about the hit squads attacking MySQL.  I never
condone inaccuracy or attacks, but I can understand why it is happening.

For years, I believe the deceptiveness of the MySQL crashme test has
hampered acceptance of PostgreSQL.  And our response was to just reply
with our opinion when asked about it.  We didn't create a web page to
attack MySQL and make them look bad.  We believed that in the end, truth
always wins.  So we kept going, and you know, in the end, truth does
win.  We have a $25 million dollar company forming around PostgreSQL,
with maybe more to come.  We are on our way up, even though the MySQL
crashme test delayed us.

And there is a saying "If you are not nice to people on your way up,
they will not be nice to you on the way down."  I bet the hit squads are
frustrated people who have seen unfair things said about PostgreSQL for
years, with nothing they could do about it.  Now they can do something,
and they are talking.  But instead of one web page with deceptive
results, you have 100 people all over the net slamming MySQL.  There is
a certain poetic justice in that.  The saying goes, "Oh what a tangled
web we weave, When first we practice to deceive".

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sun, 21 May 2000 05:06:10 +0200

Hi,

Bruce Momjian:
> > 
> > > 2. atomic_updates = no
> > That's a misnomer. They actually mean this:
> > 
> > 	create table crash_q (a integer not null);
> > 	create unique index crf on crash_q(a);
> > 
> > 	insert into crash_q values (2);
> > 	insert into crash_q values (3);
> > 	insert into crash_q values (1);
> > 	update crash_q set a=a+1;
> 
> Poorly named, huh?  How do you think it got such a name?  This item was
> on the crashme tests before TRANSACTION was on there?

It probably got that name because nobody thought about people
associating atomicity with transactions.

Anyway, the issue isn't all that exotic. ms-sql, mimer, db2, solid and
sybase are listed as supporting this kind of update.


If you can think of an understandable five-word-or-so description for
it, I'll happily rename the test. I've been thinking about it for the
last ten minutes or so, but couldn't come up with one. :-/


A different question is whether the database bungles the update when the
first few row can be updated and THEN you run into a conflict.

PostgreSQL handles this case correctly, MySQL doesn't => I'll add a
test for it.

===

Subject: [HACKERS] Re: MySQL crashme test and PostgreSQL
From: Mike Mascari <mascarm@mascari.com>
Date: Sat, 20 May 2000 23:37:17 -0400

Bruce Momjian wrote:
> 
> [CC: to general list.]
> 
> > I might add that their test lists 52 features of PostgreSQL which
> > MySQL doesn't have (13 functions). It also lists 122 features of 
> > MySQL which PostgreSQL doesn't have; 78 of those are extra 
> > functions (40 of these, just for M$-ODBC compatibility).
> 
> >
> > So it seems that overall, that crash-me test result is reasonably
> > balanced (39 vs. 44 non-function differences -- let's face it,
> > adding another function for compatibility with SQL variant FOO is
> > one of the easier exercises here, whatever the current value of 
> > FOO is).
> 
> You have to make the test deceptive to get MySQL to be on par with
> PostgreSQL.  Period.  Doesn't MySQL admit they have fewer features
> than PostgreSQL.  How did MySQL get an equal score on features?
> Answer me that one.

That's easy:

MySQL has type mediumint 
PostgreSQL has transactions

MySQL allows 'and' as string markers
PostgreSQL has views

MySQL has case insensitive compare
PostgreSQL has referential integrity

MySQL has support for 0000-00-00 dates
PostgreSQL has subqueries

MySQL has 'drop table if exists'
PostgreSQL has multiversion concurrency control

etc.

See? Equal. I hope my sarcasm is not too overstated.

===

Subject: Re: [HACKERS] Re: Heaps of read() syscalls by the postmaster
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sun, 21 May 2000 06:14:15 +0200

Hi,

Tom Lane:
> entry construction (worst possible time).  If you care to try the
> repaired code, see our CVS server, or grab a nightly snapshot dated
> later than this message.
> 
Thank you, I'll do that.

===

Subject: [HACKERS] Re: MySQL crashme test and PostgreSQL
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sun, 21 May 2000 06:26:46 +0200

Hi,

Bruce Momjian:
> Also, I have heard about the hit squads attacking MySQL.  I never
> condone inaccuracy or attacks, but I can understand why it is happening.
> 
You _are_ doing your side of the story a disservice, you know that?

> For years, I believe the deceptiveness of the MySQL crashme test has
> hampered acceptance of PostgreSQL.  And our response was to just reply
> with our opinion when asked about it.

Yeah, I can see that.

Let me tell you up front that your opinion is not at all helpful to
either the cause of PostgreSQL or to the problems between you and the
MySQL people, especially when stated like this.


This is the Internet. The right thing to do if somebody spreads bad
information (a biased, inaccurate, wrong, deceptive, what-have-you)
crash-me test would be to write your own test which either prefers
PostgreSQL, or is reasonably neutral.


I'll shut up now, until the first of my patches is in the crash-me
suite. Perhaps that will have _some_ impact here.

===

Subject: [HACKERS] Re: MySQL crashme test and PostgreSQL
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sun, 21 May 2000 06:34:13 +0200


Hi,

Mike Mascari:
> MySQL has type mediumint 
> PostgreSQL has transactions
> 
> MySQL allows 'and' as string markers
> PostgreSQL has views

Look, we all know that transaction support is more important than type
mediumint or the function ODBC LENGTH or cosine or whatever.

But what should I, or anybody else, do about it, in your opinion? Take
the "unimportant" tests out? Invent a couple of inconsequential stuff
PostgreSQL can do to balance the book? Repeat the "transactions=no"
entry in the crash-me results file ten times?

> See? Equal. I hope my sarcasm is not too overstated.

Sarcasm hasn't helped in the past, and it's unlikely to help in the future.

===

Subject: Re: [HACKERS] More Performance 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 21 May 2000 00:45:58 -0400

Matthias Urlichs" <smurf@noris.net> writes:
> I've found another one of these performance problems in the benchmark,
> related to another ignored index.
> The whole thing works perfectly after a VACUUM ANALYZE on the
> table.
> IMHO this is somewhat non-optimal. In the absence of information
> to the contrary, PostgreSQL should default to using an index if
> it might be appropriate, not ignore it.

Just FYI: Postgres absolutely does not "ignore" an index in the absence
of VACUUM ANALYZE stats.  However, the default assumptions about
selectivity stats create cost estimates that are not too far different
for index and sequential scans.  On a never-vacuumed table you will
get an indexscan for "WHERE col = foo".  If the table has been vacuumed
but never vacuum analyzed, it turns out that you get varying results
depending on the size of the table and the average tuple size (since the
planner now has non-default info about the table size, but still nothing
about the actual selectivity of the WHERE condition).

The cost estimation code is under active development, and if you check
the pgsql list archives you will find lively discussions about its
deficiencies ;-).  But I'm having a hard time mustering much concern
about whether it behaves optimally in the vacuum-but-no-vacuum-analyze
case.


===

Subject: Re: MySQL's "crashme" (was Re: [HACKERS] Performance)
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sun, 21 May 2000 08:25:32 +0200


Hi,

Tom Lane:
> I would not like to see us labeled "crashme unsafe" merely because
> someone is too impatient to let the test run to conclusion.

There's not much anybody can do about a backend which dies because of a
"hard" out-of-memory error which the OS only notices when all it can do
is segfault the client.

Anyway, I'll go through your list of problems and create an appropriate
patch for the beast.


Not to forget: THANK YOU for taking the time to go through this.


> > alter_drop_col=no			# Alter table drop column
> 
> While our ALTER TABLE support is certainly pretty weak, it should be
> noted that this test will continue to fail even when we have ALTER TABLE
> DROP COLUMN, because crashme is testing for a non-SQL-compliant syntax.
> 
You mean because the COLUMN keyword is missing? Added.

> > alter_rename_table=no			# Alter table rename table
> 
> We have ALTER TABLE RENAME ... but not under the syntax crashme is
> testing. 

TO keyword added.

> > atomic_updates=no			# atomic updates
> 
Test clarified and two new tests added. The result now is:

atomic_updates=no			# update constraint check are deferred
atomic_updates_ok=yes			# failed atomic update

MySQL has "no" and "no".

> > automatic_rowid=no			# Automatic rowid
> 
> Test is actually looking for a system column named "_rowid".  Our OIDs
> serve the same purpose

I'll add a test for OIDs.

> > binary_items=no				# binary items (0x41)
> 
> We have binary literals (per the test name) and hex literals (what
> it actually appears to be testing).  Unfortunately for us, ours are
> SQL92-compliant syntax, and what crashme is looking for isn't.
> 
I'll tell them to fix that.

> > comment_#=no				# # as comment
> 
> It'd be helpful to the reader if they indicated which two of these
> conventions are SQL-compliant ... of course, that might expose the
> fact that MySQL isn't ...
> 
Are there any problems with using '#' as a comment character, given that
MySQL doesn't support user-defined operators?

> > connections=32				# Simultaneous connections
> 
> Should probably be noted that this is just the default limit

Noted.


> > crash_me_safe=no			# crash me safe
> 
> I get "yes", and I'd *really* appreciate it if you not submit this
> misleading statement.
> 
I won't submit test results (not until the thing is cleaned up to
everybody's satisfaction), but I'll change the documentation of the
thing to state that 

>>> Some of the tests you are about to execute require a lot of memory.
>>> Your tests _will_ adversely affect system performance. Either this
>>> crash-me test program, or the actual database back-end, _will_ die with
>>> an out-of-memory error. So might any other program on your system if it
>>> requests more memory at the wrong time.

Good enough?


> > date_zero=no				# Supports 0000-00-00 dates
> Another test that only MySQL "passes".
... and SOLID.

> > except=no				# except
> Unsurprisingly, hardly anyone else accepts it either.
SOLID again.

I'll mark the features that are necessary for SQL compliancy (as well as
those that actually are detrimental to it).  _After_ the actual test
results are cleaned up.

> What they are looking for here is "SELECT NOT 0", which Postgres rejects
> as a type violation.  SQL-compliant "NOT FALSE" would work.
> 
... not with MySQL, which doesn't have FALSE in the first place.  :-(

I added another test for TRUE/FALSE, and fixed the NOT-0 thing.

> > having_with_alias=no			# Having on alias
> 
> Again, how curious that MySQL is the only DBMS shown as passing this
> test.  Couldn't be because it violates SQL92, could it?
> 
No, but it's an extremely nice feature to have, IMHO.

I will not do anything about tests for extensions that won't hurt one
way or another. Classifying them, as noted above, should be sufficient.

> > insert_select=no			# insert INTO ... SELECT ...
> 
> We would pass this test if the crashme script weren't buggy: it fails
> to clean up after a prior test that creates a crash_q table with
> different column names.

Fixed.

> > logical_value=1			# Value of logical operation (1=1)
> 
> A rather odd result, considering that what Postgres actually returns for
> "SELECT (1=1)" is 't'.  But showing the correct answer isn't one of
> crashme's highest priorities...
> 
> > minus_neg=no				# Calculate 1--1
> 
> Another case where "passing" the test means accepting MySQL's version of
> reality instead of SQL92's.  All the SQL-compliant DBMSs think -- is a
> comment introducer

So does MySQL -- when the '--' is followed by a space.

They do explain that in their documentation. I have to agree with them
 -- you can turn a perfectly legal SQL statement into dangerous nonsense
with this kind of comment.

>>> $dbh->do("update foo set bar = baz-$adjust where something-or-other").

That kind of mistake can be rather expensive.

> > select_string_size=+16208		# constant string size in SELECT
> 
> I got 1048567 here, roughly corresponding to where I set max_buffer_size.
> Not sure why you get a smaller answer.
> 
Note the '+'. I have changed the test to 2*max_row_size since anything
bigger will return an empty answer anyway.

> > select_table_update=no			# Update with sub select
> 
> We certainly have update with sub select.  What they're looking for is
> the non-SQL-compliant syntax
> 	update crash_q set crash_q.b=
> 		(select b from crash_me where crash_q.a = crash_me.a);

Gah. Thanks; fixed. 

> One could also complain about the very large number of tests that are
> checking features that are non-SQL if not downright SQL-contradictory,
> but are listed simply as bullet points with no pro or con.  A naive
> reader would think that green stars are always good; they are not,
> but how are you to tell without a copy of the SQL spec in hand?
> 
I'll adapt the comments, but that is quite time consuming (and the
changes are extensive) and thus will have to wait until after the first
round is in one of their next alpha-test releases.

===

Subject: Re: MySQL's "crashme" (was Re: [HACKERS] Performance)
From: Kaare Rasmussen <kar@webline.dk>
Date: Sun, 21 May 2000 11:34:00 +0200

minus_neg=no				# Calculate 1--1

Minus_neg expressed as select 1- -1; works. 

===

Subject: Re: [HACKERS] Re: MySQL crashme test and PostgreSQL
From: Benjamin Adida <ben@mit.edu>
Date: Sun, 21 May 2000 10:31:45 -0400

on 5/21/00 12:34 AM, Matthias Urlichs at smurf@noris.net wrote:

> But what should I, or anybody else, do about it, in your opinion? Take
> the "unimportant" tests out? Invent a couple of inconsequential stuff
> PostgreSQL can do to balance the book? Repeat the "transactions=no"
> entry in the crash-me results file ten times?

Take the unimportant tests out. Absolutely. Explain why the important tests
are important. The MySQL team is responsible for teaching a generation of
hackers that "transactions aren't important, they're just for lazy coders."

The solution here looks extremely simple. The only risk, of course, is that
it makes MySQL look bad, which I understand could be an unwanted outcome on
your end.

-Ben

===

Subject: Re: MySQL's "crashme" (was Re: [HACKERS] Performance) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 21 May 2000 13:10:27 -0400

Matthias Urlichs" <smurf@noris.net> writes:
> Tom Lane:
>> I would not like to see us labeled "crashme unsafe" merely because
>> someone is too impatient to let the test run to conclusion.

> There's not much anybody can do about a backend which dies because of a
> "hard" out-of-memory error which the OS only notices when all it can do
> is segfault the client.

I'm just saying that it's unfair to downrate us when the problem is
demonstrably in crashme itself and not in Postgres.

>>>> alter_drop_col=no			# Alter table drop column
>> 
>> While our ALTER TABLE support is certainly pretty weak, it should be
>> noted that this test will continue to fail even when we have ALTER TABLE
>> DROP COLUMN, because crashme is testing for a non-SQL-compliant syntax.
>> 
> You mean because the COLUMN keyword is missing? Added.

No, the COLUMN keyword is optional according to the SQL92 specification:

         <alter table statement> ::=
              ALTER TABLE <table name> <alter table action>

         <alter table action> ::=
                <add column definition>
              | <alter column definition>
              | <drop column definition>
              | <add table constraint definition>
              | <drop table constraint definition>

         <drop column definition> ::=
              DROP [ COLUMN ] <column name> <drop behavior>

         <drop behavior> ::= CASCADE | RESTRICT

What is *not* optional is a <drop behavior> keyword.  Although we don't
yet implement DROP COLUMN, our parser already has this statement in it
  --- and it follows the SQL92 grammar.

>>>> comment_#=no				# # as comment
>> 
>> It'd be helpful to the reader if they indicated which two of these
>> conventions are SQL-compliant ... of course, that might expose the
>> fact that MySQL isn't ...
>> 
> Are there any problems with using '#' as a comment character, given that
> MySQL doesn't support user-defined operators?

Only in that your scripts don't port to spec-compliant DBMSes ...

>>>> Some of the tests you are about to execute require a lot of memory.
>>>> Your tests _will_ adversely affect system performance. Either this
>>>> crash-me test program, or the actual database back-end, _will_ die with
>>>> an out-of-memory error. So might any other program on your system if it
>>>> requests more memory at the wrong time.

> Good enough?

No, pretty misleading I'd say.  Since the crashme script does have a
limit on max_buffer_size, it *will* run to completion if run on a
machine with a sufficiently large per-process memory limit (and enough
swap of course).  I may just be old-fashioned in running with a
not-so-large memory limit.  It'd probably be more helpful if you
document the behavior seen when crashme runs out of memory (for me,
the script just stops cold with no notice) and what to do about it
(reduce max_buffer_size until it runs to completion).

>>>> date_zero=no				# Supports 0000-00-00 dates
>> Another test that only MySQL "passes".
> ... and SOLID.

Still doesn't mean it's a good idea ;-)

>>>> except=no				# except
>> Unsurprisingly, hardly anyone else accepts it either.
> SOLID again.

It'd be appropriate to divide this into two tests, or at least relabel
it.

>>>> minus_neg=no				# Calculate 1--1
>> 
>> Another case where "passing" the test means accepting MySQL's version of
>> reality instead of SQL92's.  All the SQL-compliant DBMSs think -- is a
>> comment introducer

> So does MySQL -- when the '--' is followed by a space.

Considering how much we got ragged on for not being perfectly compliant
with SQL-spec handling of comments (up till 7.0 our parser didn't
recognize "--" as a comment if it was embedded in a multicharacter
operator --- but we knew that was a bug), I don't have a lot of sympathy
for MySQL unilaterally redefining the spec here.  And I have even less
for them devising a test that can only be "passed" by non-spec-compliant
parsers, and then deliberately mislabeling it to give the impression
that the spec-compliant systems are seriously broken.  How about
labeling the results "Fails to recognize -- comment introducer unless
surrounded by whitespace"?


Anyway, I am pleased to see you trying to clean up the mess.
Good luck!

===

Subject: Re: [HACKERS] Re: MySQL crashme test and PostgreSQL
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Sun, 21 May 2000 13:49:45 -0400 (EDT)


> Hi,
> 
> Bruce Momjian:
> > Also, I have heard about the hit squads attacking MySQL.  I never
> > condone inaccuracy or attacks, but I can understand why it is happening.
> > 
> You _are_ doing your side of the story a disservice, you know that?

Hey, I am not saying I like it happening.  All I am saying is that I can
understand why it is happening.  Certainly MSSQL and Oracle are the real
products we need to compete against.

> 
> > For years, I believe the deceptiveness of the MySQL crashme test has
> > hampered acceptance of PostgreSQL.  And our response was to just reply
> > with our opinion when asked about it.
> 
> Yeah, I can see that.
> 
> Let me tell you up front that your opinion is not at all helpful to
> either the cause of PostgreSQL or to the problems between you and the
> MySQL people, especially when stated like this.
> 
> 
> This is the Internet. The right thing to do if somebody spreads bad
> information (a biased, inaccurate, wrong, deceptive, what-have-you)
> crash-me test would be to write your own test which either prefers
> PostgreSQL, or is reasonably neutral.

We have better things to do than compete against deceptive tests.  We
just work to make are our product better and better.  Making another
crashme test is not going to make PostgreSQL a better piece of software.


===

Subject: [HACKERS] Re: [GENERAL] Re: MySQL crashme test and PostgreSQL
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Sun, 21 May 2000 13:44:48 -0400 (EDT)

That's easy:
> 
> MySQL has type mediumint 
> PostgreSQL has transactions
> 
> MySQL allows 'and' as string markers
> PostgreSQL has views
> 
> MySQL has case insensitive compare
> PostgreSQL has referential integrity
> 
> MySQL has support for 0000-00-00 dates
> PostgreSQL has subqueries
> 
> MySQL has 'drop table if exists'
> PostgreSQL has multiversion concurrency control
> 
> etc.
> 
> See? Equal. I hope my sarcasm is not too overstated.

It took me a minute to figure this out.  Wow, that was funny.  I am
still laughing.

===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server)) 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 21 May 2000 14:45:22 -0400

Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Isn't it a fundamental principle to define primary(unique
> identification) constraint for each table ?
> I had never thought that the only one index of pg_attrdef 
> isn't an unique identification index until I came across the
> unexpcted result of my DROP COLUMN test case.

Good point --- I was only thinking about the performance aspect, but
if we're going to have unique indexes to prevent errors in other
system tables then pg_attrdef deserves one too.

Actually, I have a more radical proposal: why not get rid of pg_attrdef
entirely, and add its two useful columns (adsrc, adbin) to pg_attribute?
If we allow them to be NULL for attributes with no default, then there's
no space overhead where they're not being used, and we eliminate any
need for the second table.

			regards, tom lane
===

Subject: Re: [HACKERS] Performance (was: The New Slashdot Setup (includes MySql server))
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sat, 20 May 2000 12:14:38 +0200

5mCyUwZo2JvN/JJP
Content-Type: text/plain; charset=us-ascii

Hi,

Thomas Lockhart:
> 
> Hmm. And then who's job is it to take someone else's work and make it
> accurate? If the shoe were on the other foot: if I generated a
> benchmark suite and features list, and it contained major and numerous
> inaccuracies, who would you expect to be responsible (or at least feel
> responsible) for correcting/updating/improving it? 'Twould be me imho.
> 
Umm, there's still a difference between saying (a) "it's broken, fix
it", (b) "here's my analysis as to what exactly is broken, can you fix
it", and (c) "here's a patch that fixes it".

I get the distinct impression that most of the communication between the
PostgreSQL and MySQL people has been looking more like (a) in the
past... if I can help both projects by doing some "translation" towards
(b) and (c), if at all possible, then so much the better.

> We've tried, and failed (to date) to contribute information to the
> "crashme" travesty. My recollection was a ~30% error rate on
> information for Postgres, and I didn't look into the stats for other
> databases. Check the archives for details.
> 
Attached is the current crashme output. "crash_me_safe" is off only
because of the fact that some tests go beyond available memory.
There's no sense in testing how far you can push a "SELECT a from b where
c = 'xxx(several megabytes worth of Xes)'" query when the size fo a TEXT
field is limited to 32k.

Limits with '+' in front of the number say that this is the max value
tested, without implying whether higher values are OK or not.

If you have any remarks, especially about the '=no' results (i.e. you
think PostgreSQL can do that, therefore the crashme test must be wrong
somehow), tell me. Otherwise I'll forward the results to the MySQL
people next week.


The crash-me test script, BTW, is included in MySQL's sql-bench
subdirectory.


  --5mCyUwZo2JvN/JJP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=pgtest

#This file is automaticly generated by crash-me 1.43

NEG=yes					# update of column= -column
Need_cast_for_null=no			# Need to cast NULL for arithmetic
alter_add_col=yes			# Alter table add column
alter_add_constraint=no			# Alter table add constraint
alter_add_foreign_key=yes		# Alter table add foreign key
alter_add_multi_col=no			# Alter table add many columns
alter_add_primary_key=no		# Alter table add primary key
alter_add_unique=no			# Alter table add unique
alter_alter_col=yes			# Alter table alter column default
alter_change_col=no			# Alter table change column
alter_drop_col=no			# Alter table drop column
alter_drop_constraint=no		# Alter table drop constraint
alter_drop_foreign_key=no		# Alter table drop foreign key
alter_drop_primary_key=no		# Alter table drop primary key
alter_drop_unique=no			# Alter table drop unique
alter_modify_col=no			# Alter table modify column
alter_rename_table=no			# Alter table rename table
atomic_updates=no			# atomic updates
automatic_rowid=no			# Automatic rowid
binary_items=no				# binary items (0x41)
case_insensitive_strings=no		# case insensitive compare
char_is_space_filled=yes		# char are space filled
column_alias=yes			# Column alias
columns_in_group_by=+64			# number of columns in group by
columns_in_order_by=+64			# number of columns in order by
comment_#=no				# # as comment
comment_--=yes				# -- as comment
comment_/**/=yes			# /* */ as comment
comment_//=no				# // as comment
compute=no				# Compute
connections=32				# Simultaneous connections
constraint_check=yes			# Column constraints
constraint_check_table=yes		# Table constraints
constraint_null=yes			# NULL constraint (SyBase style)
crash_me_safe=no			# crash me safe
crash_me_version=1.43			# crash me version
create_default=yes			# default value for column
create_default_func=no			# default value function for column
create_if_not_exists=no			# create table if not exists
create_index=yes			# create index
create_schema=no			# Create SCHEMA
create_table_select=no			# create table from select
cross_join=yes				# cross join (same as from a,b)
date_last=yes				# Supports 9999-12-31 dates
date_one=yes				# Supports 0001-01-01 dates
date_with_YY=yes			# Supports YY-MM-DD 2000 compilant dates
date_zero=no				# Supports 0000-00-00 dates
domains=no				# Domains (ANSI SQL)
double_quotes=yes			# Double '' as ' in strings
drop_if_exists=no			# drop table if exists
drop_index=yes				# drop index
end_colon=yes				# allows end ';'
except=no				# except
except_all=no				# except all
float_int_expr=yes			# mixing of integer and float in expression
foreign_key=yes				# foreign keys
foreign_key_syntax=yes			# foreign key syntax
full_outer_join=no			# full outer join
func_extra_!=no				# Function NOT as '!' in SELECT
func_extra_%=yes			# Function MOD as %
func_extra_&=no				# Function & (bitwise and)
func_extra_&&=no			# Function AND as '&&'
func_extra_<>=yes			# Function <> in SELECT
func_extra_==yes			# Function =
func_extra_add_months=no		# Function ADD_MONTHS
func_extra_and_or=yes			# Function AND and OR in SELECT
func_extra_ascii_char=no		# Function ASCII_CHAR
func_extra_ascii_code=no		# Function ASCII_CODE
func_extra_atn2=no			# Function ATN2
func_extra_auto_num2string=no		# Function automatic num->string convert
func_extra_auto_string2num=yes		# Function automatic string->num convert
func_extra_between=yes			# Function BETWEEN in SELECT
func_extra_binary_shifts=no		# Function << and >> (bitwise shifts)
func_extra_bit_count=no			# Function BIT_COUNT
func_extra_ceil=yes			# Function CEIL
func_extra_charindex=no			# Function CHARINDEX
func_extra_chr=no			# Function CHR
func_extra_concat_as_+=no		# Function concatenation with +
func_extra_concat_list=no		# Function CONCAT(list)
func_extra_convert=no			# Function CONVERT
func_extra_cosh=no			# Function COSH
func_extra_date_format=no		# Function DATE_FORMAT
func_extra_dateadd=no			# Function DATEADD
func_extra_datediff=no			# Function DATEDIFF
func_extra_datename=no			# Function DATENAME
func_extra_datepart=no			# Function DATEPART
func_extra_elt=no			# Function ELT
func_extra_encrypt=no			# Function ENCRYPT
func_extra_field=no			# Function FIELD
func_extra_format=no			# Function FORMAT
func_extra_from_days=no			# Function FROM_DAYS
func_extra_from_unixtime=no		# Function FROM_UNIXTIME
func_extra_getdate=no			# Function GETDATE
func_extra_greatest=no			# Function GREATEST
func_extra_if=no			# Function IF
func_extra_in_num=yes			# Function IN on numbers in SELECT
func_extra_in_str=no			# Function IN on strings in SELECT
func_extra_initcap=yes			# Function INITCAP
func_extra_instr=no			# Function LOCATE as INSTR
func_extra_instr_oracle=no		# Function INSTR (Oracle syntax)
func_extra_instrb=no			# Function INSTRB
func_extra_interval=no			# Function INTERVAL
func_extra_last_day=no			# Function LAST_DAY
func_extra_last_insert_id=no		# Function LAST_INSERT_ID
func_extra_least=no			# Function LEAST
func_extra_lengthb=no			# Function LENGTHB
func_extra_like=yes			# Function LIKE in SELECT
func_extra_like_escape=no		# Function LIKE ESCAPE in SELECT
func_extra_ln=no			# Function LN
func_extra_log(m_n)=yes			# Function LOG(m,n)
func_extra_logn=no			# Function LOGN
func_extra_lpad=yes			# Function LPAD
func_extra_mdy=no			# Function MDY
func_extra_mid=no			# Function SUBSTRING as MID
func_extra_months_between=no		# Function MONTHS_BETWEEN
func_extra_not=no			# Function NOT in SELECT
func_extra_not_between=yes		# Function NOT BETWEEN in SELECT
func_extra_not_like=yes			# Function NOT LIKE in SELECT
func_extra_odbc_convert=no		# Function ODBC CONVERT
func_extra_password=no			# Function PASSWORD
func_extra_paste=no			# Function PASTE
func_extra_patindex=no			# Function PATINDEX
func_extra_period_add=no		# Function PERIOD_ADD
func_extra_period_diff=no		# Function PERIOD_DIFF
func_extra_pow=no			# Function POW
func_extra_range=no			# Function RANGE
func_extra_regexp=no			# Function REGEXP in SELECT
func_extra_replicate=no			# Function REPLICATE
func_extra_reverse=no			# Function REVERSE
func_extra_root=no			# Function ROOT
func_extra_round1=yes			# Function ROUND(1 arg)
func_extra_rpad=yes			# Function RPAD
func_extra_sec_to_time=no		# Function SEC_TO_TIME
func_extra_sinh=no			# Function SINH
func_extra_str=no			# Function STR
func_extra_strcmp=no			# Function STRCMP
func_extra_stuff=no			# Function STUFF
func_extra_substrb=no			# Function SUBSTRB
func_extra_substring_index=no		# Function SUBSTRING_INDEX
func_extra_sysdate=no			# Function SYSDATE
func_extra_tail=no			# Function TAIL
func_extra_tanh=no			# Function TANH
func_extra_time_to_sec=no		# Function TIME_TO_SEC
func_extra_to_days=no			# Function TO_DAYS
func_extra_translate=yes		# Function TRANSLATE
func_extra_trim_many_char=yes		# Function TRIM; Many char extension
func_extra_trim_substring=error		# Function TRIM; Substring extension
func_extra_trunc=yes			# Function TRUNC
func_extra_uid=no			# Function UID
func_extra_unix_timestamp=no		# Function UNIX_TIMESTAMP
func_extra_userenv=no			# Function USERENV
func_extra_version=yes			# Function VERSION
func_extra_weekday=no			# Function WEEKDAY
func_extra_|=no				# Function | (bitwise or)
func_extra_||=no			# Function OR as '||'
func_odbc_abs=yes			# Function ABS
func_odbc_acos=yes			# Function ACOS
func_odbc_ascii=yes			# Function ASCII
func_odbc_asin=yes			# Function ASIN
func_odbc_atan=yes			# Function ATAN
func_odbc_atan2=yes			# Function ATAN2
func_odbc_ceiling=no			# Function CEILING
func_odbc_char=no			# Function CHAR
func_odbc_concat=no			# Function CONCAT(2 arg)
func_odbc_cos=yes			# Function COS
func_odbc_cot=yes			# Function COT
func_odbc_curdate=no			# Function CURDATE
func_odbc_curtime=no			# Function CURTIME
func_odbc_database=no			# Function DATABASE
func_odbc_dayname=no			# Function DAYNAME
func_odbc_dayofmonth=no			# Function DAYOFMONTH
func_odbc_dayofweek=no			# Function DAYOFWEEK
func_odbc_dayofyear=no			# Function DAYOFYEAR
func_odbc_degrees=yes			# Function DEGREES
func_odbc_difference=no			# Function DIFFERENCE()
func_odbc_exp=no			# Function EXP
func_odbc_floor=yes			# Function FLOOR
func_odbc_fn_left=no			# Function ODBC syntax LEFT & RIGHT
func_odbc_hour=no			# Function HOUR
func_odbc_hour_time=no			# Function ANSI HOUR
func_odbc_ifnull=no			# Function IFNULL
func_odbc_insert=no			# Function INSERT
func_odbc_lcase=no			# Function LCASE
func_odbc_left=no			# Function LEFT
func_odbc_length=no			# Function REAL LENGTH
func_odbc_length_without_space=no	# Function ODBC LENGTH
func_odbc_locate_2=no			# Function LOCATE(2 arg)
func_odbc_locate_3=no			# Function LOCATE(3 arg)
func_odbc_log=no			# Function LOG
func_odbc_log10=no			# Function LOG10
func_odbc_ltrim=yes			# Function LTRIM
func_odbc_minute=no			# Function MINUTE
func_odbc_mod=yes			# Function MOD
func_odbc_month=no			# Function MONTH
func_odbc_monthname=no			# Function MONTHNAME
func_odbc_now=yes			# Function NOW
func_odbc_pi=yes			# Function PI
func_odbc_power=no			# Function POWER
func_odbc_quarter=no			# Function QUARTER
func_odbc_radians=yes			# Function RADIANS
func_odbc_rand=no			# Function RAND
func_odbc_repeat=yes			# Function REPEAT
func_odbc_replace=no			# Function REPLACE
func_odbc_right=no			# Function RIGHT
func_odbc_round=yes			# Function ROUND(2 arg)
func_odbc_rtrim=yes			# Function RTRIM
func_odbc_second=no			# Function SECOND
func_odbc_sign=yes			# Function SIGN
func_odbc_sin=yes			# Function SIN
func_odbc_soundex=no			# Function SOUNDEX
func_odbc_space=no			# Function SPACE
func_odbc_sqrt=no			# Function SQRT
func_odbc_substring=no			# Function ODBC SUBSTRING
func_odbc_tan=yes			# Function TAN
func_odbc_timestampadd=no		# Function TIMESTAMPADD
func_odbc_timestampdiff=no		# Function TIMESTAMPDIFF
func_odbc_truncate=no			# Function TRUNCATE
func_odbc_ucase=no			# Function UCASE
func_odbc_user()=no			# Function USER()
func_odbc_week=no			# Function WEEK
func_odbc_year=no			# Function YEAR
func_sql_+=yes				# Function +, -, * and /
func_sql_bit_length=no			# Function BIT_LENGTH
func_sql_cast=yes			# Function CAST
func_sql_char_length=yes		# Function CHAR_LENGTH
func_sql_char_length(constant)=yes	# Function CHAR_LENGTH(constant)
func_sql_character_length=yes		# Function CHARACTER_LENGTH
func_sql_coalesce=yes			# Function COALESCE
func_sql_concat_as_||=yes		# Function concatenation with ||
func_sql_current_date=yes		# Function CURRENT_DATE
func_sql_current_time=yes		# Function CURRENT_TIME
func_sql_current_timestamp=yes		# Function CURRENT_TIMESTAMP
func_sql_current_user=yes		# Function CURRENT_USER
func_sql_extract_sql=yes		# Function EXTRACT
func_sql_localtime=no			# Function LOCALTIME
func_sql_localtimestamp=no		# Function LOCALTIMESTAMP
func_sql_lower=yes			# Function LOWER
func_sql_nullif=no			# Function NULLIF
func_sql_octet_length=no		# Function OCTET_LENGTH
func_sql_position=yes			# Function POSITION
func_sql_searched_case=yes		# Function searched CASE
func_sql_session_user=yes		# Function SESSION_USER
func_sql_simple_case=yes		# Function simple CASE
func_sql_substring=yes			# Function ANSI SQL SUBSTRING
func_sql_system_user=no			# Function SYSTEM_USER
func_sql_trim=yes			# Function TRIM
func_sql_upper=yes			# Function UPPER
func_sql_user=yes			# Function USER
func_where_between=yes			# Function BETWEEN
func_where_eq_all=yes			# Function = ALL
func_where_eq_any=yes			# Function = ANY
func_where_eq_some=yes			# Function = SOME
func_where_exists=yes			# Function EXISTS
func_where_in_num=yes			# Function IN on numbers
func_where_like=yes			# Function LIKE
func_where_like_escape=no		# Function LIKE ESCAPE
func_where_match=no			# Function MATCH
func_where_match_unique=no		# Function MATCH UNIQUE
func_where_matches=no			# Function MATCHES
func_where_not_between=yes		# Function NOT BETWEEN
func_where_not_exists=yes		# Function NOT EXISTS
func_where_not_like=yes			# Function NOT LIKE
func_where_not_unique=no		# Function NOT UNIQUE
func_where_unique=no			# Function UNIQUE
functions=yes				# Functions
group_by=yes				# Group by
group_by_alias=yes			# Group by alias
group_by_null=yes			# group on column with null values
group_by_position=yes			# Group by position
group_distinct_functions=yes		# Group functions with distinct
group_func_extra_bit_and=no		# Group function BIT_AND
group_func_extra_bit_or=no		# Group function BIT_OR
group_func_extra_count_distinct_list=no	# Group function COUNT(DISTINCT expr,expr,...)
group_func_extra_std=no			# Group function STD
group_func_extra_stddev=no		# Group function STDDEV
group_func_extra_variance=no		# Group function VARIANCE
group_func_sql_avg=yes			# Group function AVG
group_func_sql_count_*=yes		# Group function COUNT (*)
group_func_sql_count_column=yes		# Group function COUNT column name
group_func_sql_count_distinct=yes	# Group function COUNT(DISTINCT expr)
group_func_sql_max=yes			# Group function MAX on numbers
group_func_sql_max_str=yes		# Group function MAX on strings
group_func_sql_min=yes			# Group function MIN on numbers
group_func_sql_min_str=yes		# Group function MIN on strings
group_func_sql_sum=yes			# Group function SUM
group_functions=yes			# Group functions
having=yes				# Having
having_with_alias=no			# Having on alias
having_with_group=yes			# Having with group function
ignore_end_space=yes			# ignore end space in compare
index_in_create=no			# index in create table
index_namespace=no			# different namespace for index
index_parts=no				# index on column part (extension)
insert_empty_string=yes			# insert empty string
insert_select=no			# insert INTO ... SELECT ...
insert_with_set=no			# INSERT with set syntax
intersect=no				# intersect
intersect_all=no			# intersect all
join_tables=+64				# tables in join
left_outer_join=no			# left outer join
left_outer_join_using=no		# left outer join using
like_with_column=yes			# column LIKE column
like_with_number=yes			# LIKE on numbers
lock_tables=yes				# lock table
logical_value=1				# Value of logical operation (1=1)
max_big_expressions=10			# big expressions
max_char_size=8104			# max char() size
max_column_name=+512			# column name length
max_columns=1600			# Columns in table
max_conditions=19994			# OR and AND in WHERE
max_expressions=9999			# simple expressions
max_index=+64				# max index
max_index_length=2704			# index length
max_index_name=+512			# index name length
max_index_part_length=2704		# max index part length
max_index_parts=16			# index parts
max_index_varchar_part_length=2704	# index varchar part length
max_row_length=7949			# max table row length (without blobs)
max_row_length_with_null=7949		# table row length with nulls (without blobs)
max_select_alias_name=+512		# select alias name length
max_stack_expression=+2000		# stacked expressions
max_table_alias_name=+512		# table alias name length
max_table_name=+512			# table name length
max_text_size=8104			# max text or blob size
max_unique_index=+64			# unique indexes
max_varchar_size=8104			# max varchar() size
minus=no				# minus
minus_neg=no				# Calculate 1--1
multi_drop=yes				# many tables to drop table
multi_strings=yes			# Multiple line strings
multi_table_delete=no			# DELETE FROM table1,table2...
multi_table_update=no			# Update with many tables
multi_value_insert=no			# Value lists in INSERT
natural_join=yes			# natural join
natural_left_outer_join=no		# natural left outer join
no_primary_key=yes			# Tables without primary key
null_concat_expr=yes			# Is 'a' || NULL = NULL
null_in_index=yes			# null in index
null_in_unique=yes			# null in unique
null_num_expr=yes			# Is 1+NULL = NULL
odbc_left_outer_join=no			# left outer join odbc style
operating_system=Linux 2.3.99s-noris-pre9-2 i686	# crash-me tested on
order_by=yes				# Order by
order_by_alias=yes			# Order by alias
order_by_function=yes			# Order by function
order_by_position=yes			# Order by position
order_by_remember_desc=no		# Order by DESC is remembered
primary_key_in_create=yes		# primary key in create table
psm_functions=no			# PSM functions (ANSI SQL)
psm_modules=no				# PSM modules (ANSI SQL)
psm_procedures=no			# PSM procedures (ANSI SQL)
psm_trigger=no				# Triggers (ANSI SQL)
query_size=16777216			# query size
quote_ident_with_"=no			# " as identifier quote (ANSI SQL)
quote_ident_with_[=no			# [] as identifier quote
quote_ident_with_`=no			# ` as identifier quote
quote_with_"=no				# Allows ' and " as string markers
recursive_subqueries=+64		# recursive subqueries
remember_end_space=no			# Remembers end space in char()
remember_end_space_varchar=yes		# Remembers end space in varchar()
repeat_string_size=+8000000		# return string size from function
right_outer_join=no			# right outer join
rowid=oid				# Type for row id
select_constants=yes			# Select constants
select_limit=with LIMIT			# LIMIT number of rows
select_limit2=yes			# SELECT with LIMIT #,#
select_string_size=+16208		# constant string size in SELECT
select_table_update=no			# Update with sub select
select_without_from=yes			# SELECT without FROM
server_version=PostgreSQL version ???	# server version
simple_joins=yes			# ANSI SQL simple joins
storage_of_float=round			# Storage of float values
subqueries=yes				# subqueries
table_alias=yes				# Table alias
table_name_case=yes			# case independent table names
table_wildcard=yes			# Select table_name.*
tempoary_table=yes			# temporary tables
transactions=yes			# transactions
type_extra_abstime=yes			# Type abstime
type_extra_bfile=no			# Type bfile
type_extra_blob=no			# Type blob
type_extra_bool=yes			# Type bool
type_extra_box=yes			# Type box
type_extra_byte=no			# Type byte
type_extra_char(1_arg)_binary=no	# Type char(1 arg) binary
type_extra_circle=yes			# Type circle
type_extra_clob=no			# Type clob
type_extra_datetime=yes			# Type datetime
type_extra_double=no			# Type double
type_extra_enum(1_arg)=no		# Type enum(1 arg)
type_extra_float(2_arg)=no		# Type float(2 arg)
type_extra_float4=yes			# Type float4
type_extra_float8=yes			# Type float8
type_extra_image=no			# Type image
type_extra_int(1_arg)_zerofill=no	# Type int(1 arg) zerofill
type_extra_int1=no			# Type int1
type_extra_int2=yes			# Type int2
type_extra_int3=no			# Type int3
type_extra_int4=yes			# Type int4
type_extra_int8=yes			# Type int8
type_extra_int_auto_increment=no	# Type int not null auto_increment
type_extra_int_identity=no		# Type int not null identity
type_extra_int_unsigned=no		# Type int unsigned
type_extra_interval=yes			# Type interval
type_extra_line=yes			# Type line
type_extra_long=no			# Type long
type_extra_long_raw=no			# Type long raw
type_extra_long_varbinary=no		# Type long varbinary
type_extra_long_varchar(1_arg)=no	# Type long varchar(1 arg)
type_extra_lseg=yes			# Type lseg
type_extra_mediumint=no			# Type mediumint
type_extra_mediumtext=no		# Type mediumtext
type_extra_middleint=no			# Type middleint
type_extra_mlslabel=no			# Type mlslabel
type_extra_money=yes			# Type money
type_extra_nclob=no			# Type nclob
type_extra_number=no			# Type number
type_extra_number(1_arg)=no		# Type number(1 arg)
type_extra_number(2_arg)=no		# Type number(2 arg)
type_extra_nvarchar2(1_arg)=no		# Type nvarchar2(1 arg)
type_extra_path=yes			# Type path
type_extra_point=yes			# Type point
type_extra_polygon=yes			# Type polygon
type_extra_raw(1_arg)=no		# Type raw(1 arg)
type_extra_reltime=yes			# Type reltime
type_extra_rowid=no			# Type rowid
type_extra_serial=yes			# Type serial
type_extra_set(1_arg)=no		# Type set(1 arg)
type_extra_smalldatetime=no		# Type smalldatetime
type_extra_smallfloat=no		# Type smallfloat
type_extra_smallmoney=no		# Type smallmoney
type_extra_text=yes			# Type text
type_extra_text(1_arg)=no		# Type text(1 arg)
type_extra_timespan=yes			# Type timespan
type_extra_uint=no			# Type uint
type_extra_varchar2(1_arg)=no		# Type varchar2(1 arg)
type_extra_year=no			# Type year
type_odbc_bigint=no			# Type bigint
type_odbc_binary(1_arg)=no		# Type binary(1 arg)
type_odbc_datetime=yes			# Type datetime
type_odbc_tinyint=no			# Type tinyint
type_odbc_varbinary(1_arg)=no		# Type varbinary(1 arg)
type_sql_bit=yes			# Type bit
type_sql_bit(1_arg)=yes			# Type bit(1 arg)
type_sql_bit_varying(1_arg)=yes		# Type bit varying(1 arg)
type_sql_char(1_arg)=yes		# Type char(1 arg)
type_sql_char_varying(1_arg)=yes	# Type char varying(1 arg)
type_sql_character(1_arg)=yes		# Type character(1 arg)
type_sql_character_varying(1_arg)=yes	# Type character varying(1 arg)
type_sql_date=yes			# Type date
type_sql_dec(2_arg)=yes			# Type dec(2 arg)
type_sql_decimal(2_arg)=yes		# Type decimal(2 arg)
type_sql_double_precision=yes		# Type double precision
type_sql_float=yes			# Type float
type_sql_float(1_arg)=yes		# Type float(1 arg)
type_sql_int=yes			# Type int
type_sql_integer=yes			# Type integer
type_sql_interval_day=yes		# Type interval day
type_sql_interval_day_to_hour=yes	# Type interval day to hour
type_sql_interval_day_to_minute=yes	# Type interval day to minute
type_sql_interval_day_to_second=yes	# Type interval day to second
type_sql_interval_hour=yes		# Type interval hour
type_sql_interval_hour_to_minute=yes	# Type interval hour to minute
type_sql_interval_hour_to_second=yes	# Type interval hour to second
type_sql_interval_minute=yes		# Type interval minute
type_sql_interval_minute_to_second=yes	# Type interval minute to second
type_sql_interval_month=yes		# Type interval month
type_sql_interval_second=yes		# Type interval second
type_sql_interval_year=yes		# Type interval year
type_sql_interval_year_to_month=yes	# Type interval year to month
type_sql_national_char_varying(1_arg)=yes	# Type national char varying(1 arg)
type_sql_national_character(1_arg)=yes	# Type national character(1 arg)
type_sql_national_character_varying(1_arg)=yes	# Type national character varying(1 arg)
type_sql_nchar(1_arg)=yes		# Type nchar(1 arg)
type_sql_nchar_varying(1_arg)=yes	# Type nchar varying(1 arg)
type_sql_numeric(2_arg)=yes		# Type numeric(2 arg)
type_sql_real=yes			# Type real
type_sql_smallint=yes			# Type smallint
type_sql_time=yes			# Type time
type_sql_timestamp=yes			# Type timestamp
type_sql_timestamp_with_time_zone=yes	# Type timestamp with time zone
type_sql_varchar(1_arg)=yes		# Type varchar(1 arg)
union=yes				# union
union_all=yes				# union all
unique_in_create=yes			# unique in create table
unique_null_in_create=yes		# unique null in create
views=yes				# views
where_string_size=+16208		# constant string size in where

  --5mCyUwZo2JvN/JJP--

===

Subject: Re: MySQL's "crashme" (was Re: [HACKERS] Performance)
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Sun, 21 May 2000 22:37:21 +0200

Hi,

Tom Lane:
> I'm just saying that it's unfair to downrate us when the problem is
> demonstrably in crashme itself and not in Postgres.
> 
Right.

>          <drop behavior> ::= CASCADE | RESTRICT
> 
> What is *not* optional is a <drop behavior> keyword.  Although we don't
> yet implement DROP COLUMN, our parser already has this statement in it
> --- and it follows the SQL92 grammar.
> 
Ah, sorry, I apparently misparsed the BNF. (It was kind of late at
night...)

> No, pretty misleading I'd say.  Since the crashme script does have a
> limit on max_buffer_size, it *will* run to completion if run on a
> machine with a sufficiently large per-process memory limit (and enough
> swap of course).

Hmm, I could add an explicit option to limit memory usage instead.
(Right now it's hardcoded in the test script.)

> >>>> date_zero=no				# Supports 0000-00-00 dates
> >> Another test that only MySQL "passes".
> > ... and SOLID.
> 
> Still doesn't mean it's a good idea ;-)
> 
No argument from me...

> >>>> except=no				# except
> It'd be appropriate to divide this into two tests, or at least relabel
> it.
> 
Already done.

> Considering how much we got ragged on for not being perfectly compliant
> with SQL-spec handling of comments (up till 7.0 our parser didn't
> recognize "--" as a comment if it was embedded in a multicharacter
> operator --- but we knew that was a bug), I don't have a lot of sympathy
> for MySQL unilaterally redefining the spec here.

They do note this noncompliance with the SQL spec in their documentation,
along with a few others.

I'll clean this one up (adding a note about the noncompliance) a bit
more, after they incorporate my patch into the next version.

> Anyway, I am pleased to see you trying to clean up the mess.
> Good luck!
> 
Thanks.


===

Subject: Re: AW: [HACKERS] Performance (was: The New Slashdot Setup (includes MySqlserver))
From: Andreas Zeugswetter <andreas.zeugswetter@telecom.at>
Date: Fri, 26 May 2000 08:30:41 +0200

On Fri, 26 May 2000, Peter Eisentraut wrote:
> Zeugswetter Andreas SB writes:
> 
> > Which is imho not what the test is for. I think they mean ctid,
> > which again I think we should have a rowid alias for (as in Informix,
> > Oracle).
> 
> Let's step back and ask: How is the behaviour of rowid (or whatever)
> defined in various existing DBMS. Then we can see if we have anything that
> matches.

This has been discussed. The outcome is, that you are only safe using rowid
if nobody else changes the row inbetween you reading it and accessing it by rowid.

This is essentially the same in all db's only the risk of rowid changing is lower
in other db's since they do inplace update, but the risk is there nevertheless.

Andreas

===

Subject: Re: [HACKERS] The New Slashdot Setup (includes MySql server)
From: Jeff MacDonald <jeff@pgsql.com>
Date: Mon, 29 May 2000 11:32:33 -0300 (ADT)

agreed, a while back i actually contacted rob malda and offered
to convert slashdot to postgres.. he asked why i would want to do this
, said postgres's features yada yada.. his reply

.. that's dandy but we don't need those features.

sad to say but mysql has a niche and slashdot fills it.

===

Subject: [HACKERS] MySQL crashme
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Mon, 22 May 2000 21:56:01 -0400 (EDT)


Matthias, let me add that I wish you luck in updating the MySQL crashme
test.  You certainly seem to be on top of the issues, and I hope they
can be resolved.

I know a lot of people on this side are hoping you can make it happen. 


===

Subject: Re: [HACKERS] A test to add to the crashme test
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Tue, 23 May 2000 08:40:54 +0200


Hi,

Ross J. Reedstrom:
> People keep claiming that applications that are essentially "read-only"
> don't need transactions. I'll agree in the limit, that truly read only
> databases don't, but I think a lot of people might be surprised at how
> little writing you need before you get into trouble. 
>          [ Mozilla buchtracking example ]
> How many writer's does it take for this to happen? One. I'd call that
> an "essentially read-only" system. A note, this is not a made up,
> theoretical example. We're talking real world here.
> 
Right. But that's not about transactions; that's about concurrent read
and write access to a table.

People using MySQL in real-world situations usually solve this with one
read/write database for "normal" work, and another one for the
long-running multi-record "let's list every bug in the system" queries.

The update from one to the other is set to low-priority so that it won't
lock out any queries (with a timeout).


Mind you: I'm not saying this is ideal. A system with concurrent
read/write access would be better. But it has the benefit of giving
you a replicated database which you can fall back to, if the primary
system is down for whatever reason.

Besides, the MySQL people are currently busy integrating Berkeley DB
into their code. Voila, instant read/write concurrency, and instant
transactions.  Well, almost.  ;-)

===

Subject: Re: [HACKERS] A test to add to the crashme test
From: Hannu Krosing <hannu@tm.ee>
Date: Tue, 23 May 2000 09:51:02 +0300


Matthias Urlichs wrote:
> 
> Besides, the MySQL people are currently busy integrating Berkeley DB
> into their code. 

Then MySQL may become a RDBMS after all ;)

> Voila, instant read/write concurrency, and instant transactions.

But won't it slow them down ?

===

Subject: Re: [HACKERS] A test to add to the crashme test
From: "Matthias Urlichs" <smurf@noris.net>
Date: 	Tue, 23 May 2000 09:58:29 +0200


Hi,

Hannu Krosing:
> > Voila, instant read/write concurrency, and instant transactions.
> But won't it slow them down ?
> 
Of course it will. That's why they make the Berkeley tables optional.

Their idea is that you use the Berkeley stuff for the tables which really
require transactions, HEAP tables for in-memory cache/temp/whatever,
and the standard MyISAM tables otherwise.

Real-world example: Your customers' account balance really should be
transaction safe, and all that. But not their address, or their
clicktrail through your online shop system.

===

Subject: [HACKERS] MySQL now supports transactions ... 
From: The Hermit Hacker <scrappy@hub.org>
Date: Tue, 23 May 2000 19:55:23 -0300 (ADT)

http://www.mysql.com/download_3.23.html

===

Subject: Re: [HACKERS] MySQL now supports transactions ...
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Tue, 23 May 2000 20:26:41 -0400 (EDT)

Apparently by way of some Berkeley DB code....
> 
> http://web.mysql.com/php/manual.php3?section=BDB
> 
> 
> 
> The Hermit Hacker wrote:
> 
> > http://www.mysql.com/download_3.23.html
> >
> > Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
> > Systems Administrator @ hub.org
> > primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org
> 
> 

Yes, I see that too.  It makes much more sense for them because they
have ordered heaps anyway, with secondary indexes, rather than our
unordered heap and indexes.

===

Subject: Re: [HACKERS] MySQL now supports transactions ...
From: "Michael A. Olson" <mao@sleepycat.com>
Date: Tue, 23 May 2000 17:25:37 -0700

At 08:02 PM 5/23/00 -0400, you wrote:

> Apparently by way of some Berkeley DB code....
> 
> http://web.mysql.com/php/manual.php3?section=BDB

Yeah, that's correct.  As there was no existing transaction layer in
place, it was pretty straightforward to add Berkeley DB.  They had
to abstract the ISAM layer that they've used to date, but there were
no serious technical issues.

You can choose which kind of tables you want (myisam or bdb) at
table creation time.  BDB tables have the standard ACID properties
that Berkeley DB provides generally, via the standard mechanisms
(two-phase locking, write-ahead logging, and so forth).

The 3.23.16 release is decidedly alpha, but is in good enough
shape to distribute.  My bet is that we'll hammer out a few dumb
bugs in the next weeks, and they'll cut something more stable
soon.

You need to download the 3.1.5 distribution of Berkeley DB from
MySQL.com.  We're not distributing that version from Sleepycat.
We're in the middle of the release cycle for our 3.1 release, and
expect to cut a stable one in the next week or so.  MySQL relies
on a couple of features we added to 3.1 for them, so they can't
run with the 3.0 release that's up on our site now.

It's been pretty quiet since my message on Sunday, about the
difficulties in integrating Berkeley DB with the PostgreSQL backend.
Vadim (and others), what is your opinion?  My impression is that
the project is too much trouble, but I'd be glad to hear from you
folks on the topic.

===

Subject: Re: [HACKERS] MySQL now supports transactions ...
From: The Hermit Hacker <scrappy@hub.org>
Date: Tue, 23 May 2000 21:39:28 -0300 (ADT)

On Tue, 23 May 2000, Bruce Momjian wrote:

> > Apparently by way of some Berkeley DB code....
> > 
> > http://web.mysql.com/php/manual.php3?section=BDB
> > 
> > 
> > 
> > The Hermit Hacker wrote:
> > 
> > > http://www.mysql.com/download_3.23.html
> > >
> > > Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
> > > Systems Administrator @ hub.org
> > > primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org
> > 
> > 
> 
> Yes, I see that too.  It makes much more sense for them because they
> have ordered heaps anyway, with secondary indexes, rather than our
> unordered heap and indexes.

Just figured a heads up was in order for those that have been using
ACID/transactions in their arguments :)


===

Subject: RE: [HACKERS] Berkeley DB... 
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
Date: Fri, 26 May 2000 11:32:20 -0700

We might have part of the story in the recently noticed fact that
> each insert/update query begins by doing a seqscan of pg_index.
> 
> I have done profiles of INSERT in the past and not found any really
> spectacular bottlenecks (but I was looking at a test table with no
> indexes, so I failed to see the pg_index problem :-().  Last time
> I did it, I had these top profile entries for inserting 100,000 rows
> of 30 columns apiece:

Well, I've dropped index but INSERTs still take 70 sec and 
COPY just 1sec -:(((

===

Subject: Re: [HACKERS] Berkeley DB...
From: Mike Mascari <mascarm@mascari.com>
Date: Fri, 26 May 2000 14:48:22 -0400

Tom Lane wrote:
> 
> Hannu Krosing <hannu@tm.ee> writes:
> >> As Vadim points out in his comparison
> >> of COPY vs. INSERT, something is *wrong* with the time it takes
> >> for PostgreSQL to parse, plan, rewrite, and optimize.
> 
> We might have part of the story in the recently noticed fact that
> each insert/update query begins by doing a seqscan of pg_index.
> 
> I have done profiles of INSERT in the past and not found any really
> spectacular bottlenecks (but I was looking at a test table with no
> indexes, so I failed to see the pg_index problem :-().  Last time
> I did it, I had these top profile entries for inserting 100,000 rows
> of 30 columns apiece:
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  30.08    290.79   290.79                             _mcount
>   6.48    353.46    62.67 30702766     0.00     0.00  AllocSetAlloc
>   5.27    404.36    50.90   205660     0.25     0.25  write
>   3.06    433.97    29.61 30702765     0.00     0.00  MemoryContextAlloc
>   2.74    460.45    26.48   100001     0.26     0.74  yyparse
>   2.63    485.86    25.41 24300077     0.00     0.00  newNode
>   2.22    507.33    21.47  3900054     0.01     0.01  yylex
>   1.63    523.04    15.71 30500751     0.00     0.00  PortalHeapMemoryAlloc
>   1.31    535.68    12.64  5419526     0.00     0.00  hash_search
>   1.18    547.11    11.43  9900000     0.00     0.00  expression_tree_walker
>   1.01    556.90     9.79  3526752     0.00     0.00  SpinRelease
> 
> While the time spent in memory allocation is annoying, that's only about
> ten mallocs per parsed data expression, so it's unlikely that we will be
> able to improve on it very much.  (We could maybe avoid having *three*
> levels of subroutine call to do an alloc, though ;-).)  Unless you are
> smarter than the flex and bison guys you are not going to be able to
> improve on the lex/parse times either.  The planner isn't even showing
> up for a simple INSERT.  Not much left, unless you can figure out how
> to write and commit a tuple with less than two disk writes.
> 
> But, as I said, this was a case with no indexes to update.
> 
> I intend to do something about caching pg_index info ASAP in the 7.1
> cycle, and then we can see how much of a difference that makes...
> 
>                         regards, tom lane

It will be interesting to see the speed differences between the
100,000 inserts above and those which have been PREPARE'd using
Karel Zak's PREPARE patch. Perhaps a generic query cache could be
used to skip the parsing/planning/optimizing stage when multiple
exact queries are submitted to the database? I suppose the cached
plans could then be discarded whenever a DDL statement or a
VACUUM ANALYZE is executed? The old Berkeley Postgres docs spoke
about cached query plans *and* results (as well as 64-bit oids,
amongst other things). I'm looking forward to when the 7.1 branch
occurs... :-)

===

Subject: [HACKERS] OID question
From: Jacques Huard <jacques@intuineer.com>
Date: Thu, 01 Jun 2000 22:29:16 GMT

I have used MySQL to do several database related things in the past
using PHP.  In the past I have used a field "row_id" as a unique number
(within that specific table) as a reference to a specific row.  This
"row_id" field  was automatically placed in a table when it was created
in MySQL and the next unique number was placed in the field when
automatically during every new insert.  This made things easy for
writing applications in PHP.  When I switch to Postgres I noticed that
there was a OID.  I believe that this "object identifier" is similar to
the "row_id" in MySQL but I am unable to access it for an given row.
PHP has a function which can get the last OID for the last "Insert"
issued, however, this won't help me accomplish the same things I was
able to accomplish using "row_id" in MySQL.  I have read the
documetation and have not found a real good description of OID but have
found commands that can add a unique sequence column which could
accomplish what I need.  However,  I need (or want) the unique sequence
column to maintain itself, without calling a function to fill in that
field during an insert.  I am sure there is an easy way to accomplish
this and I am overlooking the solution.  Could someone suggest what they
have done?  Thanks for any response.

===

Subject: [HACKERS] Berkeley DB...
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
Date: Sat, 20 May 2000 18:43:37 -0700

Well, I've read SDB code/doc for a few hours...

1. Using RECNO db for heap.
For relational DB over-writing smgr means ability to re-use space after
DELETE/UPDATE operations (without vacuum -:)). RECNO (btree by nature,
with record number as key) will not give us this ability. To insert record
into RECNO db one has either to provide "put" method with record number
(where to store new record) or specify DB_APPEND in flags, to add new record
to the end of db (without space re-using). So, the problem (one of two base
problems of over-writing smgr for us) "where to store new tuple" (ie - where
in data file there is free space for new tuple) is not resolved.
=> we can't use SDB smgr: there are no required features - space re-using
and MVCC support.

2. SDB' btree-s support only one key, but we have multi-key btree-s...

3. How can we implement gist, rtree AND (multi-key) BTREE access methods
using btree and hash access methods provided by SDB?!

1,2,3 => we have to preserve our access methods (and ability to add new!).

Now, about WAL. What is WAL? WAL *mostly* is set of functions to 
write/read log (90% implemented) + *access method specific* redo/undo
functions... to be implemented anyway, because of conclusion above.

Comments?

===

Subject: Re: [HACKERS] Berkeley DB...
From: "Michael A. Olson" <mao@sleepycat.com>
Date: Sun, 21 May 2000 11:36:59 -0700

At 06:43 PM 5/20/00 -0700, Vadim Mikheev wrote:

> 1. Using RECNO db for heap.
> For relational DB over-writing smgr means ability to re-use space after
> DELETE/UPDATE operations (without vacuum -:)). RECNO (btree by nature,
> with record number as key) will not give us this ability. To insert record
> into RECNO db one has either to provide "put" method with record number
> (where to store new record) or specify DB_APPEND in flags, to add new record
> to the end of db (without space re-using). So, the problem (one of two base
> problems of over-writing smgr for us) "where to store new tuple" (ie - where
> in data file there is free space for new tuple) is not resolved.
> => we can't use SDB smgr: there are no required features - space re-using
> and MVCC support.

All of the Berkeley DB access methods reuse space.  We return free space
to a pool and allocate from the pool in the ordinary course of operation.
We have no notion of vacuum.

Empty pages get appended to a free list, and will be reused on next page
allocation.  Empty space on pages (from deleted tuples) where the rest
of the page isn't empty will get reused the next time the page is
visited.  So you do get space reuse.  We don't return blocks to the
file system automatically (requires reorg, which is hard).  "Appending"
means appending in key space; that may or may not be physically at the
end of the file.

We do, however, do reverse splits of underfull nodes, so we're aggressive
at getting empty pages back on the free list.

In short, I think the space reuse story of Berkeley DB is better than
the current space reuse story in PostgreSQL, even for heaps.  This is
because the current heap AM doesn't opportunistically coalesce pages
to make free pages available for reuse by new inserts.

We don't have multi-version concurrency control.  It's a feature we'd like
to see added, but it certainly represents a substantial new engineering
effort.  As I've said before, we'd be glad to support you in that project
if you decide to undertake it.

> 2. SDB' btree-s support only one key, but we have multi-key btree-s...

This is a misunderstanding.  Berkeley DB allows you to use arbitrary

data structures as keys.  You define your own comparison function, which
understands your key structure and is capable of doing comparisons between
keys.  It's precisely equivalent to the support you've got in PostgreSQL
now, since your comparator has to understand key schema (including the
presence or absence of nulls).

You'd define your own comparator and your own key type.  You'd hand
(key, value) pairs to Berkeley DB, and we'd call your comparator to
compare keys during tree descent.  The key you hand us is an arbitrarily
complex data structure, but we don't care.

You get another benefit from Berkeley DB -- we eliminate the 8K limit
on tuple size.  For large records, we break them into page-sized
chunks for you, and we reassemble them on demand.  Neither PostgreSQL
nor the user needs to worry about this, it's a service that just works.

A single record or a single key may be up to 4GB in size.

> 3. How can we implement gist, rtree AND (multi-key) BTREE access methods
> using btree and hash access methods provided by SDB?!

You'd build gist and rtree on top of the current buffer manager, much
as rtree is currently implemented on top of the lower-level page manager
in PostgreSQL.  Multi-key btree support is there already, as is multi-
key extended linear hashing.  In exchange for having to build a new
rtree AM, you'd get high-performance persistent queues for free.

I'd argue that queues are more generally useful than rtrees.  I understand
that you have users who need rtrees.  I wrote that access method in
Postgres, and used it extensively for geospatial indexing during the
Sequoia 2000 project.  I'm a big fan.  Nevertheless, there are more
database customers looking for fast queues than are looking for spatial
indices.

> 1,2,3 => we have to preserve our access methods (and ability to add new!).

Again, you can add new access methods in Berkeley DB in the same way
that you do for PostgreSQL now.


> Now, about WAL. What is WAL? WAL *mostly* is set of functions to 
> write/read log (90% implemented) + *access method specific* redo/undo
> functions... to be implemented anyway, because of conclusion above.

You wouldn't need to rewrite the current access-method undo and redo
functions in Berkeley DB; they're there, and they work.  You'd need to
do that work for the new access methods you want to define, but as you
note, that work is required whether you roll your own or use Berkeley
DB.

I encourage you to think hard about the amount of work that's really
required to produce a commercial-grade recovery and transaction system.
This stuff is extremely hard to get right -- you need to design, code,
and test for very high-concurrency, complex workloads.  The log is a
new source of contention, and will be a gate to performance.  The log
is also a new way to consume space endlessly, so you'll want to think
about backup and checkpoint support.  With Berkeley DB, you get both
today.  Our backup support permits you to do on-line backups.  Backups
don't acquire locks and don't force a shutdown.

Testing this stuff is tricky.  For example, you need to prove that you're
able to survive a crash that interrupts the three internal page writes
that you do in the btree access method on a page split.  Postgres (when
I wrote the Btree AM) carefully ordered those writes to guarantee no
loss of data, but it was possible to crash with the children written and
the parent lost.  The result is an inefficiency in the tree structure
that you'll never recover, but that you can deal with at read time.  This
is an example of a case that Berkeley DB gets right.

The advantage of Berkeley DB is that we've got five years of commercial
deployment behind us.  The code is in production use, and is known to
support terabyte-sized databases, hundreds of concurrent threads with
arbitrary read/write mixes, and system and application crashes.  By
putting the log and the data on separate spindles, we are able to survive
loss of either device without losing committed data.  Big companies have
made significant bets on the software by deploying it in mission-critical
applications.  It works.

Plus, we're continuing to work on the code, and we're paid real money to
do that.  We're able to deliver significant new features and performance
improvements about three times a year.

All of that said, I'd boil Vadim's message down to this:

	+  With Berkeley DB, you'd need to reimplement multi-version
	   concurrency control, and that's an opportunity to introduce
	   new bugs.

	+  With PostgreSQL, you'll need to implement logging and recovery,
	   and that's an opportunity to introduce new bugs.

I don't think that either alternative presents insurmountable difficulties.
Which you choose depends on the technical issues and on your willingness
to integrate code from outside the project into PostreSQL's internals, to
a degree that you've never done before.

Regards,
					mike

===

Subject: RE: [HACKERS] Berkeley DB... 
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
Date: Fri, 26 May 2000 14:04:49 -0700

Well, I've dropped index but INSERTs still take 70 sec and 
> > COPY just 1sec -:(((
> 
> Well, for those that have fsync turned off we could actually 
> avoid most of the writes, could'nt we ? Just leave the page
> marked dirty. We would only need to write each new page once.
> The problem as I see it is, that we don't have a good place 
> where the writes would actually be done. Now they are obviously
> done after each insert.

I've run test without fsync and with all inserts in *single*
transaction - there should be no write after each insert...

Vadim

===

Subject: Re: [HACKERS] Berkeley DB...
From: Bruce Momjian <pgman@candle.pha.pa.us>
Date: Fri, 26 May 2000 19:41:44 -0400 (EDT)

Charset ISO-8859-1 unsupported, converting... ]
> > > Well, I've dropped index but INSERTs still take 70 sec and 
> > > COPY just 1sec -:(((
> > 
> > Well, for those that have fsync turned off we could actually 
> > avoid most of the writes, could'nt we ? Just leave the page
> > marked dirty. We would only need to write each new page once.
> > The problem as I see it is, that we don't have a good place 
> > where the writes would actually be done. Now they are obviously
> > done after each insert.
> 
> I've run test without fsync and with all inserts in *single*
> transaction - there should be no write after each insert...

Watch out.  I think Vadim is settled into San Franciso and is getting
fired up again...  :-)

===

Subject: RE: [HACKERS] Berkeley DB... 
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
Date: Sat, 27 May 2000 12:11:58 +0900

Original Message-----
> From: pgsql-hackers-owner@hub.org 
> [mailto:pgsql-hackers-owner@hub.org]On Behalf Of Mikheev, Vadim
> 
> > We might have part of the story in the recently noticed fact that
> > each insert/update query begins by doing a seqscan of pg_index.
> > 
> > I have done profiles of INSERT in the past and not found any really
> > spectacular bottlenecks (but I was looking at a test table with no
> > indexes, so I failed to see the pg_index problem :-().  Last time
> > I did it, I had these top profile entries for inserting 100,000 rows
> > of 30 columns apiece:
> 
> Well, I've dropped index but INSERTs still take 70 sec and 
> COPY just 1sec -:(((
>

Did you run vacuum after dropping indexes ?
Because DROP INDEX doesn't update relhasindex of pg_class,
planner/executer may still look up pg_index.

===

Subject: RE: [HACKERS] Berkeley DB... 
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
Date: Sat, 27 May 2000 21:44:52 -0700

Well, I've dropped index but INSERTs still take 70 sec and 
> > COPY just 1sec -:(((
> >
> 
> Did you run vacuum after dropping indexes ?
> Because DROP INDEX doesn't update relhasindex of pg_class,
> planner/executer may still look up pg_index.

Actually, I dropped and re-created table without indices...

Vadim
===

Subject: Re: [HACKERS] Berkeley DB... 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 28 May 2000 01:28:04 -0400

Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>>>> Well, I've dropped index but INSERTs still take 70 sec and 
>>>> COPY just 1sec -:(((

Mebbe so, but you can't blame it all on parse/plan overhead.

I did some experimentation on this with current sources, using a test
case of inserting 100,000 rows of 16 columns (the regression "tenk1"
table's contents repeated 10 times).  Each test was started with a
freshly created empty table.  The initial runs were done with all
postmaster options except -F defaulted.  All numbers are wall-clock time
in seconds; the "+" column is the time increase from the previous case:

load via COPY, fsync off:
0 indexes	24.45s
1 index		48.88s		+ 24.43
2 indexes	62.65s		+ 13.77
3 indexes	96.84s		+ 34.19
4 indexes	134.09s		+ 37.25

load via INSERTs, fsync off, one xact (begin/end around all inserts):
0 indexes	194.95s
1 index		247.21s		+ 52.26
2 indexes	269.69s		+ 22.48
3 indexes	307.33s		+ 37.64
4 indexes	352.72s		+ 45.39

load via INSERTs, fsync off, separate transaction for each insert:
0 indexes	236.53s
1 index		295.96s		+ 59.43
2 indexes	323.40s		+ 27.44
[ got bored before doing 3/4 index cases ... ]

load via INSERTs, fsync on, separate transactions:
0 indexes	5189.99s
[ don't want to know how long it will take with indexes :-( ]

So while the parse/plan overhead looks kinda bad next to a bare COPY,
it's not anything like a 70:1 penalty.  But an fsync per insert is
that bad and worse.

I then recompiled with -pg to learn more about where the time was going.
One of the useful places to look at is calls to FileSeek, since mdread,
mdwrite, and mdextend all call it.  To calibrate these numbers, the
table being created occupies 2326 pages and the first index is 343
pages.

Inserts (all in 1 xact), no indexes:
                0.00    0.00       1/109528      init_irels [648]
                0.00    0.00      85/109528      mdread [592]
                0.01    0.00    2327/109528      mdextend [474]
                0.01    0.00    2343/109528      mdwrite [517]
                0.23    0.00  104772/109528      _mdnblocks [251]
[250]    0.0    0.24    0.00  109528         FileSeek [250]
Inserts (1 xact), 1 index:
                0.00    0.00       1/321663      init_irels [649]
                0.00    0.00    2667/321663      mdextend [514]
                0.10    0.00   55478/321663      mdread [277]
                0.11    0.00   58096/321663      mdwrite [258]
                0.38    0.00  205421/321663      _mdnblocks [229]
[213]    0.1    0.60    0.00  321663         FileSeek [213]
COPY, no indexes:
                0.00    0.00       1/109527      init_irels [431]
                0.00    0.00      84/109527      mdread [404]
                0.00    0.00    2327/109527      mdextend [145]
                0.00    0.00    2343/109527      mdwrite [178]
                0.07    0.00  104772/109527      _mdnblocks [77]
[83]     0.0    0.07    0.00  109527         FileSeek [83]
COPY, 1 index:
                0.00    0.00       1/218549      init_irels [382]
                0.00    0.00    2667/218549      mdextend [220]
                0.07    0.00   53917/218549      mdread [106]
                0.08    0.00   56542/218549      mdwrite [99]
                0.14    0.00  105422/218549      _mdnblocks [120]
[90]     0.0    0.30    0.00  218549         FileSeek [90]

The extra _mdnblocks() calls for the inserts/1index case seem to be from
the pg_index scans in ExecOpenIndices (which is called 100000 times in
the inserts case but just once in the COPY case).  We know how to fix
that.  Otherwise the COPY and INSERT paths seem to be pretty similar as
far as actual I/O calls go.  The thing that jumps out here, however, is
that it takes upwards of 50000 page reads and writes to prepare a
343-page index.  Most of the write calls turn out to be from
BufferReplace, which is pretty conclusive evidence that the default
setting of -B 64 is not enough for this example; we need more buffers.

At -B 128, inserts/0index seems about the same, inserts/1index traffic is
                0.00    0.00       1/270331      init_irels [637]
                0.01    0.00    2667/270331      mdextend [510]
                0.06    0.00   29798/270331      mdread [354]
                0.06    0.00   32444/270331      mdwrite [277]
                0.40    0.00  205421/270331      _mdnblocks [229]
[223]    0.1    0.52    0.00  270331         FileSeek [223]
At -B 256, inserts/1index traffic is
                0.00    0.00       1/221849      init_irels [650]
                0.00    0.00    2667/221849      mdextend [480]
                0.01    0.00    5556/221849      mdread [513]
                0.01    0.00    8204/221849      mdwrite [460]
                0.37    0.00  205421/221849      _mdnblocks [233]
[240]    0.0    0.40    0.00  221849         FileSeek [240]
At -B 512, inserts/1index traffic is
                0.00    0.00       1/210788      init_irels [650]
                0.00    0.00      25/210788      mdread [676]
                0.00    0.00    2667/210788      mdextend [555]
                0.00    0.00    2674/210788      mdwrite [564]
                0.27    0.00  205421/210788      _mdnblocks [248]
[271]    0.0    0.28    0.00  210788         FileSeek [271]

So as long as the -B setting is large enough to avoid thrashing, there
shouldn't be much penalty to making an index.  I didn't have time to run
the COPY cases but I expect they'd be about the same.

Bottom line is that where I/O costs are concerned, the parse/plan
overhead for INSERTs is insignificant except for the known problem
of wanting to rescan pg_index for each INSERT.  The CPU overhead is
significant, at least if you're comparing no-fsync performance ...
but as I commented before, I doubt we can do a whole lot better in
that area for simple INSERTs.

			regards, tom lane
===

Subject: Re: [HACKERS] Berkeley DB...
From: "Zeugswetter Andreas" <andreas.zeugswetter@telecom.at>
Date: Fri, 26 May 2000 21:43:20 +0200

%   cumulative   self              self     total
> >  time   seconds   seconds    calls  ms/call  ms/call  name
> >  30.08    290.79   290.79                             _mcount
> >   6.48    353.46    62.67 30702766     0.00     0.00  AllocSetAlloc
> >   5.27    404.36    50.90   205660     0.25     0.25  write
> >   3.06    433.97    29.61 30702765     0.00     0.00  MemoryContextAlloc
> >   2.74    460.45    26.48   100001     0.26     0.74  yyparse
> >   2.63    485.86    25.41 24300077     0.00     0.00  newNode
> >   2.22    507.33    21.47  3900054     0.01     0.01  yylex
> >   1.63    523.04    15.71 30500751     0.00     0.00  PortalHeapMemoryAlloc
> >   1.31    535.68    12.64  5419526     0.00     0.00  hash_search
> >   1.18    547.11    11.43  9900000     0.00     0.00  expression_tree_walker
> >   1.01    556.90     9.79  3526752     0.00     0.00  SpinRelease
> > 
> > While the time spent in memory allocation is annoying, that's only about
> > ten mallocs per parsed data expression, so it's unlikely that we will be
> > able to improve on it very much.  (We could maybe avoid having *three*
> > levels of subroutine call to do an alloc, though ;-).)  Unless you are
> > smarter than the flex and bison guys you are not going to be able to
> > improve on the lex/parse times either.  The planner isn't even showing
> > up for a simple INSERT.  Not much left, unless you can figure out how
> > to write and commit a tuple with less than two disk writes.

> It will be interesting to see the speed differences between the
> 100,000 inserts above and those which have been PREPARE'd using
> Karel Zak's PREPARE patch

If we believe the above output, the win won't be very noticeable. It is the 
writes (and the Allocs) we have to get rid of.
The above is much faster if you do:
begin work;
100000 inserts ....;
commit work;

Andreas

===

Subject: Re: [HACKERS] Berkeley DB... 
From: Peter Eisentraut <peter_e@gmx.net>
Date:   Mon, 29 May 2000 00:18:01 +0200 (CEST)

Tom Lane writes:

> I have done profiles of INSERT in the past and not found any really
> spectacular bottlenecks

I am still at a loss on how to make profiles. The latest thing that
happened to me is that the postmaster gave me a `Profiling timer expired'
message and never started up. Any idea?


===

Subject: Re: [HACKERS] Berkeley DB... 
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 28 May 2000 22:15:29 -0400

Peter Eisentraut <peter_e@gmx.net> writes:
> I am still at a loss on how to make profiles. The latest thing that
> happened to me is that the postmaster gave me a `Profiling timer expired'
> message and never started up. Any idea?

Dunno ... PROFILE=-pg works for me ...

Normally there's a special startup file that the compiler is supposed to
know to link instead of the usual crt0.o, when you link with -pg.
Possibly there's something wrong with yours?

===

Subject: Re: [HACKERS] Berkeley DB...
From: "Matthias Urlichs" <smurf@noris.de>
Date: 	Fri, 26 May 2000 10:16:31 +0200

Hi,

Mike Mascari:
> DML statements vs. MySQL. As Vadim points out in his comparison
> of COPY vs. INSERT, something is *wrong* with the time it takes
> for PostgreSQL to parse, plan, rewrite, and optimize.

In my throughly unscientific opinion, the problem may well be the fact
that PostgreSQL recurses the whole process, i.e. it is looking up
attributes of one table in a bunch of other tables.

MySQL, by contrast, has three files per table -- one with the data,
one with _all_ the indices, and one .frm file with all the other
metadata you would ever want to know about a table.

That metadata file is mapped into shared memory space by the first task
that opens a table, and it stays there. The data and index files also
stay open until they're flushed.

Since MySQL is multithreaded, opening a new connection is extremely
cheap. By contrast, PostgreSQL does more than 30 open() calls when I
connect to it.(*) It's still lots faster than some other databases I might
mention, though...

Access control is done by a bunch of tables in the "mysql" database,
but these are 100% cached.

One nice side effect of this is that it's very easy to access tables
from another database. Just say "select * from foo.bar".


(*) The list:
/data//pg_options
/etc/passwd
/etc/group
/data//PG_VERSION
/data//pg_database
/data//base/test/PG_VERSION
/data//base/test/pg_internal.init
/data//pg_log
/data//pg_variable
/data//base/test/pg_class
/data//base/test/pg_class_relname_index
/data//base/test/pg_attribute
/data//base/test/pg_attribute_relid_attnum_index
/data//base/test/pg_trigger
/data//base/test/pg_am
/data//base/test/pg_index
/data//base/test/pg_amproc
/data//base/test/pg_amop
/data//base/test/pg_operator
/data//base/test/pg_index_indexrelid_index
/data//base/test/pg_operator_oid_index
/data//base/test/pg_index_indexrelid_index
/data//base/test/pg_trigger_tgrelid_index
/data//pg_shadow
/data//pg_database
/data//base/test/pg_proc
/data//base/test/pg_proc_proname_narg_type_index
/data//base/test/pg_type
/data//base/test/pg_type_oid_index
/data//base/test/pg_proc_oid_index
/data//base/test/pg_rewrite
/data//base/test/pg_user
/data//base/test/pg_attribute_relid_attnam_index
/data//base/test/pg_operator_oprname_l_r_k_index
/data//base/test/pg_class_oid_index
/data//base/test/pg_statistic
/data//base/test/pg_statistic_relid_att_index

===

Subject: RE: [HACKERS] Berkeley DB...
From: Andreas Zeugswetter <andreas.zeugswetter@telecom.at>
Date: Sun, 28 May 2000 09:30:38 +0200

On Fri, 26 May 2000, Mikheev, Vadim wrote:
> > > Well, I've dropped index but INSERTs still take 70 sec and 
> > > COPY just 1sec -:(((
> > 
> > Well, for those that have fsync turned off we could actually 
> > avoid most of the writes, could'nt we ? Just leave the page
> > marked dirty. We would only need to write each new page once.
> > The problem as I see it is, that we don't have a good place 
> > where the writes would actually be done. Now they are obviously
> > done after each insert.
> 
> I've run test without fsync and with all inserts in *single*
> transaction - there should be no write after each insert...

Yes, but if you don't do the inserts in one big transaction
and don't issue transaction statements ( no begin or commit )
then you get the behavior I described.

===

Subject: RE: [HACKERS] Berkeley DB... 
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
Date: Mon, 29 May 2000 13:51:42 +0900

Original Message-----
> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]
> 
> > > Well, I've dropped index but INSERTs still take 70 sec and 
> > > COPY just 1sec -:(((
> > >
> > 
> > Did you run vacuum after dropping indexes ?
> > Because DROP INDEX doesn't update relhasindex of pg_class,
> > planner/executer may still look up pg_index.
> 
> Actually, I dropped and re-created table without indices...
>

Oops,aren't you testing in 6.5.3 ?
ExecOpenIndices() always refers to pg_index in 6.5.x.
Currently it doesn't refer to pg_index if relhasindex is
false. 

===

Subject: Re: [HACKERS] Berkeley DB...
From: Karel Zak <zakkr@zf.jcu.cz>
Date: Mon, 29 May 2000 16:57:04 +0200 (CEST)

It will be interesting to see the speed differences between the
> 100,000 inserts above and those which have been PREPARE'd using
> Karel Zak's PREPARE patch. Perhaps a generic query cache could be

My test:

	postmaster:	-F -B 2000	
	rows:		100,000 
	table:		create table (data text);
	data:		37B for eache line
	
	--- all is in one transaction

	native insert:		66.522s
	prepared insert:	59.431s	    - 11% faster	

 
IMHO parsing/optimizing is relative easy for a simple INSERT.
The query (plan) cache will probably save time for complicated SELECTs 
with functions ...etc. (like query that for parsing need look at to system
tables). For example:

	insert into tab values ('some data' || 'somedata' || 'some data');

	native insert:		91.787s
	prepared insert:	45.077s     - 50% faster

	(Note: This second test was faster, because I stop X-server and
	postgres had more memory :-)


 The best way for large and simple data inserting is (forever) COPY, not
exist faster way. 

 pg's path(s) of query:
 
 native insert:		parser -> planner -> executor -> storage
 prepared insert:	parser (for execute stmt) -> executor -> storage
 copy:			utils (copy) -> storage

> amongst other things). I'm looking forward to when the 7.1 branch
> occurs... :-)

 I too.

							Karel

===

Subject: RE: [HACKERS] Berkeley DB... 
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
Date: Mon, 29 May 2000 15:08:51 -0700

So while the parse/plan overhead looks kinda bad next to a bare COPY,
> it's not anything like a 70:1 penalty.  But an fsync per insert is

Isn't it because of your table has 16 columns and my table has only 2?

> that bad and worse.

Of course -:)

2Hiroshi - yes, I've used 6.5.3...

===





the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu