pgsql-hackers_comparison_of_gzip_and_bzip

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



To: =?iso-8859-1?q?Mart=EDn=20Marqu=E9s?=
<martin@bugs.unl.edu.ar>
From: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [HACKERS] beta3 
Date: Tue, 20 Nov 2001 10:16:57 -0500

=?iso-8859-1?q?Mart=EDn=20Marqu=E9s?= <martin@bugs.unl.edu.ar> writes:
> P.D.: bzip2 is slow, but you can get a real small package with it, even 
> though PostgreSQL isn't that big, if we compare it with KDE or Mozilla.

As an experiment, I zipped my current PG source tree with both.  (This
isn't an exact test of the distribution size, because I didn't bother
to get rid of the CVS control files, but it's pretty close.)

Original tar file:      37089280 bytes
gzip -9:		 8183182 bytes
bzip2:			 6762638 bytes

or slightly less than a 20% savings for bzip over gzip.  That's useful,
but not exactly compelling.  A comparison of unzip runtime also seems
relevant:

$ time gunzip pgsql.tar.gz

real    0m5.48s
user    0m4.46s
sys     0m0.62s

$ time bunzip2 pgsql.tar.bz2

real    0m27.77s
user    0m26.50s
sys     0m0.92s

If I'd downloaded this thing over a decent DSL or cable modem line,
bzip2 would actually be a net loss in total download + uncompress time.

<editorial>
The reason bzip is still an also-ran is that it's not enough better
than gzip to have persuaded people to switch over.  My bet is that
bzip will always be an also-ran, and that gzip will remain the de
facto standard until something comes along that's really significantly
better, like a factor of 2 better.  I've watched this sort of game
play out before, and I know you don't take over the world with a 20%
improvement over the existing standard.  At least not without other
compelling reasons, like speed (oops) or patent freedom (no win there
either).
</editorial>

===

To: Tom Lane <tgl@sss.pgh.pa.us>
From: mlw <markw@mohawksoft.com>
Subject: Re: [HACKERS] beta3
Date: Fri, 23 Nov 2001 11:10:33 -0500

Tom Lane wrote:
> <editorial>
> The reason bzip is still an also-ran is that it's not enough better
> than gzip to have persuaded people to switch over.  My bet is that
> bzip will always be an also-ran, and that gzip will remain the de
> facto standard until something comes along that's really significantly
> better, like a factor of 2 better.  I've watched this sort of game
> play out before, and I know you don't take over the world with a 20%
> improvement over the existing standard.  At least not without other
> compelling reasons, like speed (oops) or patent freedom (no win there
> either).
> </editorial>

While agree in principle with your view on bzip2, I think there is a strong
reason why you should use it, 20%

That 20% is quite valuable. Just by switching to bzip2, the hosting companies
can deliver 20% more downloads with the same equipment and bandwidth cost. The
people with slow connections can get it 20% faster.

Will bzip2 become the standard? Probably not in general use, but for
downloadable tarballs it is rapidly becoming the standard. Those who pay for
bandwidth (server or client) welcome any improvement possible. 

I would switch the argument around, time how long it takes to do:

ncftpget postgresql-xxxx.tar.gz
tar xpzvf postgresql-xxxx.tar.gz
cd postgresql-xxxx
./configure --option
make
make install

the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu