pgsql-hackers-very_large_scale_postgresql_support

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

Date: Sat, 7 Feb 2004 13:29:13 -0500
From: "Alex J. Avriette" <alex@posixnap.net>
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] RFC: Very large scale postgres support

Recently I was tasked with creating a "distribution system" for
postgres nodes here at work. This would allow us to simply bring up a
new box, push postgres to it, and have a new database.

At the same time, we have started to approach the limits of what we can
do with postgres on one machine. Our platform presently is the HP
DL380. It is a reasonably fast machine, but in order to eke more
performance out of postgres, we are going to have to upgrade the
hardware substantially.

So the subject came up, wouldn't it be nice if, with replication and
proxies, we could create postgres clusters? When we need more
throughput, to just put a new box in the cluster, dist a psotgres
instance to it, and tell the proxy about it. This is a very attractive
idea for us, from a scalability standpoint. It means that we don't have
to buy $300,000 servers when we max out our 2- or 4- cpu machines (in
the past, I would have suggested a Sun V880 for this database, but we
are using Linux on x86).

We are left with one last option, and that is re-engineering our
application to distribute load across several instances of postgres
which are operating without any real knowledge of eachother. I worry,
though, that as our needs increase further, these application redesigns
will become asymptotic.

I find myself wondering what other people are doing with postgres that
this doesn't seem to have come up. When one searches for postgres
clustering on google, they will find lots of HA products. However,
nobody seems to be attempting to create very high throughput clusters.

I feel that it would be a very good thing if some thinking on this
subject was done. In the future, people will hopefully begin using
postgres for more intense applications. We are looking at perhaps many
tens of billions of transactions per day within the next year or two.
To simply buy a "bigger box" each time we outgrow the one we're on is
not effective nor efficient. I simply don't believe we're the only ones
pushing postgres this hard.

I understand there are many applications out there trying to achieve
replication. Some of them seem fairly promising. However, it seems to
me that if we want to see a true clustered database environment, there
would have to be actual native support in the postmaster (inter
postmaster communication if you will) for replication and
cross-instance locking.

This is obviously a complicated problem, and probably not very many of
us are doing anything near as large-scale as this. However, I am sure
most of us can see the benefit of being able to provide support for
these sorts of applications.

I've just submitted this RFC in the hopes that we can discuss both the
best way to support very large scale databases, as well as how to
handle them presently.

===
Date: Sun, 08 Feb 2004 20:07:14 +0100
From: Andreas Pflug <pgadmin@pse-consulting.de>
To: "Alex J. Avriette" <alex@posixnap.net>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] RFC: Very large scale postgres support

Alex J. Avriette wrote:


>I feel that it would be a very good thing if some thinking on this
>subject was done. In the future, people will hopefully begin using
>postgres for more intense applications. We are looking at perhaps many
>tens of billions of transactions per day within the next year or two.

tens of billions =10e10 per day? This is probably a typo, because this 
would mean > 100,000 requests per second? Do you want to feed a monitor 
with pixel data right from the database, using individual queries for 
each pixel? Or record each irc user's keyclick in the world concurrently 
online in a single database?


===

Date: Sun, 8 Feb 2004 17:41:37 -0500
From: "Alex J. Avriette" <alex@posixnap.net>
To: Andreas Pflug <pgadmin@pse-consulting.de>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] RFC: Very large scale postgres support

Andreas Pflug wrote:

> >I feel that it would be a very good thing if some thinking on this
> >subject was done. In the future, people will hopefully begin using
> >postgres for more intense applications. We are looking at perhaps many
> >tens of billions of transactions per day within the next year or two.
> > 
> >
> 
> tens of billions =10e10 per day? This is probably a typo, because this 
> would mean > 100,000 requests per second? Do you want to feed a monitor 

That's what I said, and what I meant. Ten billion transactions equates
to 115,740 transactions per second.

> with pixel data right from the database, using individual queries for 
> each pixel? Or record each irc user's keyclick in the world concurrently 
> online in a single database?

Just because you don't think there is a valid use for that sort of
traffic doesn't mean there isn't one. Imagine, if you will, a hundred
thousand agents making four to five requests a second.  Now, imagine
these requests are all going to the same database.

I'll leave the rest of this exercise up to you.

The fact is, there are situations in which such extreme traffic is
warranted. My concern is that I am not able to use postgres in such
situations because it cannot scale to that level. I feel that it would
be possible to reach that level with support in the postmaster for
replication. 

With software load balancing (eg rotors or similar) and updates between
postmasters, it would be (it seems to me) possible to drastically
increase the available capacity of a database installation through the
addition of more nodes. This has the added benefit of allowing us to
distribute network resources.


===

Subject: Re: [HACKERS] RFC: Very large scale postgres support
From: Rod Taylor <pg@rbt.ca>
To: "Alex J. Avriette" <alex@posixnap.net>
Cc: Andreas Pflug <pgadmin@pse-consulting.de>,
Date: Sun, 08 Feb 2004 20:01:38 -0500

> The fact is, there are situations in which such extreme traffic is
> warranted. My concern is that I am not able to use postgres in such
> situations because it cannot scale to that level. I feel that it would
> be possible to reach that level with support in the postmaster for
> replication. 

Replication won't help if those are all mostly write transactions. If a
small percentage, even 1% would be challenging, is INSERTS, UPDATES or
DELETES, master / slave replication might get you somewhere.

Otherwise you're going to need to partition the data up into smaller,
easily managed sizes -- that of course requires an ability to
horizontally partition the data.

Anyway, if you want a sane answer we need more information about the
data (is it partitionable?), schema type, queries producing the load
(simple or complex), acceptable data delays (does a new insert need to
be immediately visible?), etc.

Dealing with a hundred thousand queries/second isn't just challenging to
PostgreSQL, you will be hard pressed to find the hardware that will push
that much data around even with the overhead of the database itself.


===
Date: Sun, 8 Feb 2004 21:01:10 -0500
From: "Alex J. Avriette" <alex@posixnap.net>
To: Rod Taylor <pg@rbt.ca>
Cc: Andreas Pflug <pgadmin@pse-consulting.de>,
Subject: Re: [HACKERS] RFC: Very large scale postgres support

Rod Taylor wrote:

> Replication won't help if those are all mostly write transactions. If a
> small percentage, even 1% would be challenging, is INSERTS, UPDATES or
> DELETES, master / slave replication might get you somewhere.

There is no way on earth we could be doing writes at that rate. I think
that's a given.

> Otherwise you're going to need to partition the data up into smaller,
> easily managed sizes -- that of course requires an ability to
> horizontally partition the data.

Obviously, this is the route we have taken.

> Anyway, if you want a sane answer we need more information about the
> data (is it partitionable?), schema type, queries producing the load
> (simple or complex), acceptable data delays (does a new insert need to
> be immediately visible?), etc.

We've considered a lot of this. Like I said, I think a lot of our need
for distributing the database can be helped along with native
replication. Am I hearing that nobody believes scalability is a
concern?  I think many of us would like to see features that would
allow large scale installations to be more practical. I also think most
of us would agree that the current "graft-on" replication methods are
sub-ideal.


===

Subject: Re: [HACKERS] RFC: Very large scale postgres support
From: Rod Taylor <pg@rbt.ca>
To: "Alex J. Avriette" <alex@posixnap.net>
Date: Sun, 08 Feb 2004 21:17:48 -0500

> replication. Am I hearing that nobody believes scalability is a
> concern?  I think many of us would like to see features that would
> allow large scale installations to be more practical. I also think most
> of us would agree that the current "graft-on" replication methods are
> sub-ideal.

You really haven't told us which approach to scaling you require. Many
are available and PostgreSQL is really only good at a few of them.

Anyway, current replication is a PITA mostly due to it's inability to
easily start from an empty live database and catch up. You might want to
throw a developer at helping SLONY along if you're feel master/slave is
the right direction, since presumably you will have several slaves.

===

Subject: Re: [HACKERS] RFC: Very large scale postgres support
From: Rod Taylor <pg@rbt.ca>
To: "Alex J. Avriette" <alex@posixnap.net>
Date: Sun, 08 Feb 2004 21:20:07 -0500

Alex J. Avriette wrote:
> Rod Taylor wrote:
> 
> > Replication won't help if those are all mostly write transactions. If a
> > small percentage, even 1% would be challenging, is INSERTS, UPDATES or
> > DELETES, master / slave replication might get you somewhere.
> 
> There is no way on earth we could be doing writes at that rate. I think
> that's a given.

Sure you can, if you can horizontally partition the data so clients A
are on machine A, clients B are on machine B, ...

===

Date: Mon, 9 Feb 2004 01:54:09 -0500
From: "Alex J. Avriette" <alex@posixnap.net>
To: Rod Taylor <pg@rbt.ca>
Subject: Re: [HACKERS] RFC: Very large scale postgres support

Rod Taylor wrote:
> Alex J. Avriette wrote:
> > Rod Taylor wrote:
> > 
> > > Replication won't help if those are all mostly write transactions. If a
> > > small percentage, even 1% would be challenging, is INSERTS, UPDATES or
> > > DELETES, master / slave replication might get you somewhere.
> > 
> > There is no way on earth we could be doing writes at that rate. I think
> > that's a given.
> 
> Sure you can, if you can horizontally partition the data so clients A
> are on machine A, clients B are on machine B, ...

I think you were assuming inserts here. The problem actually comes from
updates here. The problem is, if I update here, how long before the
rest of my "cloud" of postgres nodes understand that record is
updated?  With an insert, the transaction and propagation are fairly
clear. With an update, the overall cost is higher, and the cost
per-node is higher.

===

Subject: Re: [HACKERS] RFC: Very large scale postgres support
From: Chris <list@1006.org>
To: pgsql-hackers@postgresql.org
Date: Mon, 09 Feb 2004 08:33:31 +0100

> That's what I said, and what I meant. Ten billion transactions equates
> to 115,740 transactions per second.

Have you tried to look at the scientific comunity? CERN has setups
that produce such large amounts of data - try searching google for

 http://www.google.com/search?q=cern+event+database+postgresql

or even

 http://www.google.com/search?q=cern+event+database

You might find some inspiration there!

===

From: Josh Berkus <josh@agliodbs.com>
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] RFC: Very large scale postgres support
Date: Sun, 8 Feb 2004 23:27:29 -0800

> I find myself wondering what other people are doing with postgres that
> this doesn't seem to have come up. When one searches for postgres
> clustering on google, they will find lots of HA products. However,
> nobody seems to be attempting to create very high throughput clusters.

Have you checked out "Clusgres" from Linux Labs?   One of my clients will be 
testing this application soon.

While we haven't run into transaction throughput limits, a couple of my 
client's seem to be running in x86's limits on very large queries, 
particularly on vendor hardware.

===

Subject: Re: [HACKERS] RFC: Very large scale postgres support
From: Rod Taylor <pg@rbt.ca>
To: "Alex J. Avriette" <alex@posixnap.net>
Date: Mon, 09 Feb 2004 08:29:15 -0500

Alex J. Avriette wrote:
> Rod Taylor wrote:
> > Alex J. Avriette wrote:
> > > Rod Taylor wrote:
> > > 
> > > > Replication won't help if those are all mostly write transactions. If a
> > > > small percentage, even 1% would be challenging, is INSERTS, UPDATES or
> > > > DELETES, master / slave replication might get you somewhere.
> > > 
> > > There is no way on earth we could be doing writes at that rate. I think
> > > that's a given.
> > 
> > Sure you can, if you can horizontally partition the data so clients A
> > are on machine A, clients B are on machine B, ...
> 
> I think you were assuming inserts here. The problem actually comes from
> updates here. The problem is, if I update here, how long before the

No, with the above I'm assuming that you have several completely
independent systems.

Look, if you're really doing that many select style queries, I presume
it is because you simply have too much data or it changes too frequently
to present statically as otherwise you wouldn't be using the database
for those queries.

To me, this implies either every one of your mirrors is going to need
some kind of enterprise storage solution, OR you segregate the clients
onto different databases so there is not any cross talk or data
propagation at all.

Anyway, you've yet to tell us anything of substance to even allow
guessing at what your solution should be. Simple master/slave mirroring
is only one of many options.

===

From: "Keith Bottner" <kbottner@comcast.net>
To: "'Alex J. Avriette'" <alex@posixnap.net>, <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] RFC: Very large scale postgres support
Date: Mon, 9 Feb 2004 08:53:01 -0600

Alex,

I agree that this is something that is worth spending time on. This
resembles the Oracle RAC (Real Application Cluster). While other people may
feel that the amount of data is unreasonable I have a similar problem that
will only be solved using such a solution.

In regards to how your database is designed? Who cares? This is an RFC for a
general discussion on how to design this level of functionality into
Postgres. Ultimately any solution would work without regard to the insert,
updates, or deletes being executed. Alex, I think as a first step we should
start coming up with a feature list of what would be necessary to support
this level of functionality. From that point we could then identify efforts
that are currently ongoing on Postgres development that we could help out on
as well as those items that would need to be handled directly.

I am very interested in going forth with this discussion and believe that I
would be able to have the company I work for put forward resources (i.e.
people or money) on developing the solution if we can come up with a
workable plan.

Josh, thanks for the heads up on Clusgres, I will take a look and see how
that fits.


===

Date: Mon, 09 Feb 2004 16:53:34 +0100
From: Andreas Pflug <pgadmin@pse-consulting.de>
To: Keith Bottner <kbottner@comcast.net>
Subject: Re: [HACKERS] RFC: Very large scale postgres support

Keith Bottner wrote:

>Alex,
>
>I agree that this is something that is worth spending time on. This
>resembles the Oracle RAC (Real Application Cluster). While other people may
>feel that the amount of data is unreasonable I have a similar problem that
>will only be solved using such a solution.
>
>In regards to how your database is designed? Who cares? This is an RFC for a
>general discussion on how to design this level of functionality into
>Postgres.
>

IMHO a general discussion isn't too helpful, you might be discussing 
stuff that's never needed for PostgreSQL. Different database systems 
give different solutions to the same problem, as you might see from e.g. 
table partition discussions, which where initiated by Oracle-originating 
people.
There still might be weaknesses in pgsql, but to identify them, *real* 
issues need to be discussed. This is necessary to avoid major 
hardware/software dbms efforts that might well be replaced by 
organizational/app level tools.

===

From: "Keith Bottner" <kbottner@comcast.net>
To: "'Andreas Pflug'" <pgadmin@pse-consulting.de>
Cc: "'Alex J. Avriette'" <alex@posixnap.net>, <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] RFC: Very large scale postgres support
Date: Mon, 9 Feb 2004 10:53:10 -0600

I always enjoy how everyone wants to talk about using different solutions
prior to understanding the complete problem. I would say that a *real* issue
is any perceived issue whether a current solution exists or not. If current
solutions are applicable and would work then great we have all gained;
however, if in fact it is necessary to add the additional functionality to
Postgres then so be it. But none of this can be decided until the complete
problem and hence the requirements are understood. My impression of the
Postgres project has always been that of a high end database system that is
endeavoring to become a component of critical enterprise systems. If this is
not true or as a group we are going to keep placing these scalability issues
aside then this will never be achieved and those of us who want Postgres to
play a more important role for our corporate systems will have no choice but
to go somewhere else.

I understand your position Andreas and respect your opinion; maybe what I
have identified as requirements is what you are specifying as *real* issues.
I hope so, because I to would like to avoid unnecessary dbms efforts. But
from what I understand of Alex's problem and more specifically mine, adding
another layer at the organizational/app level will not provide the level of
functionality that is required.

===

Date: Mon, 09 Feb 2004 18:04:59 +0100
From: Andreas Pflug <pgadmin@pse-consulting.de>
To: Keith Bottner <kbottner@comcast.net>
Cc: "'Alex J. Avriette'" <alex@posixnap.net>, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] RFC: Very large scale postgres support

Keith Bottner wrote:

>I understand your position Andreas and respect your opinion; maybe what I
>have identified as requirements is what you are specifying as *real* issues.
>I hope so, because I to would like to avoid unnecessary dbms efforts.

You got me very right. I didn't mean to declare high volume databasing 
as minor issue, I meant real world requirements.

===

From: eddy.kalem@edusoft.com (ElPeddy)
Subject: Re: [HACKERS] RFC: Very large scale postgres support
Date: 26 Feb 2004 11:04:45 -0800
To: pgsql-hackers@postgresql.org

kbottner@comcast.net ("Keith Bottner") wrote in message news:<007f01c3ef1c$6a230ab0$7d00a8c0@juxtapose>...
> 
> I agree that this is something that is worth spending time on. This
> resembles the Oracle RAC (Real Application Cluster). While other people may
> feel that the amount of data is unreasonable I have a similar problem that
> will only be solved using such a solution.
> 
> In regards to how your database is designed? Who cares? This is an RFC for a
> general discussion on how to design this level of functionality into
> Postgres. Ultimately any solution would work without regard to the insert,
> updates, or deletes being executed. Alex, I think as a first step we should
> start coming up with a feature list of what would be necessary to support
> this level of functionality. From that point we could then identify efforts
> that are currently ongoing on Postgres development that we could help out on
> as well as those items that would need to be handled directly.
> 
> I am very interested in going forth with this discussion and believe that I
> would be able to have the company I work for put forward resources (i.e.
> people or money) on developing the solution if we can come up with a
> workable plan.
> 
> Josh, thanks for the heads up on Clusgres, I will take a look and see how
> that fits.

In our quest to see if we can get better performance out of PostgreSQL
by throwing more HW at it, I would have recommended a V880 also. I'm
curious to find out why you would have: "(in the past, I would have
suggested a Sun V880 for this database, but we are using Linux on
x86)" too.

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu