modperl-comparison_caching_methods

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

To: <modperl@apache.org>
From: "Rob Mueller (fastmail)" <robm@fastmail.fm>
Subject: Comparison of different caching schemes
Date: Wed, 12 Dec 2001 15:05:33 +1100

Just thought people might be interested...

I sat down the other day and wrote a test script to try out
various caching implementations. The script is pretty basic
at the moment, I just wanted to get an idea of the
performance of different methods.

The basic scenario is the common mod_perl situation:
* Multiple processes
* You need to share perl structures between processes
* You want to index the data structure on some key
* Same key is written and read multiple times

I tried out the following systems.
* Null reference case (just store in 'in process' hash)
* Storable reference case (just store in 'in process' hash after 'freeze')
* Cache::Mmap (uses Storable)
* Cache::FileCache (uses Storable)
* DBI (I used InnoDB), use Storable, always do 'delete' then 'insert'
* DBI, use Storable, do 'select' then 'insert' or 'update'
* MLDBM::Sync::SDBM_File (uses Storable)
* Cache::SharedMemoryCache (uses Storable)

For systems like Cache::* which can automatically delete
items after an amount of time or size, I left these options
off or made the cache big enough that this wouldn't happen.

I've included the script for people to try out if they like
and add other test cases.

Now to the results, here they are.

Package C0 - In process hash
Sets per sec =3D 147116
Gets per sec =3D 81597
Mixes per sec =3D 124120
Package C1 - Storable freeze/thaw
Sets per sec =3D 2665
Gets per sec =3D 6653
Mixes per sec =3D 3880
Package C2 - Cache::Mmap
Sets per sec =3D 809
Gets per sec =3D 3235
Mixes per sec =3D 1261
Package C3 - Cache::FileCache
Sets per sec =3D 393
Gets per sec =3D 831
Mixes per sec =3D 401
Package C4 - DBI with freeze/thaw
Sets per sec =3D 651
Gets per sec =3D 1648
Mixes per sec =3D 816
Package C5 - DBI (use updates with dup) with freeze/thaw
Sets per sec =3D 657
Gets per sec =3D 1994
Mixes per sec =3D 944
Package C6 - MLDBM::Sync::SDBM_File
Sets per sec =3D 334
Gets per sec =3D 1279
Mixes per sec =3D 524
Package C7 - Cache::SharedMemoryCache
Sets per sec =3D 42
Gets per sec =3D 29
Mixes per sec =3D 32


Notes:

* System =3D Pentium III 866, Linux 2.4.16-0.6, Ext3 (no
  special file filesystem flags), MySQL (with InnoDB tables)

* Null reference hash is slower reading because it does a
  very basic check to see that the retrieved hash has the
  same number of keys as the stored hash

* Approximate performance order (best to worst) =3D
  Cache::Mmap, DBI, MLDBM::Sync::SDBM_File,
  Cache::FileCache, Cache::SharedMemoryCache

* Remember what Knuth said, "Premature optimisation is the
  root of all evil." This data won't help you if something
  else in your application is the bottleneck...

* The code is available at: http://fastmail.fm/users/robm/perl/cacheperltest.pl

Have I missed something obvious?

Rob
. <- Grain of salt to be taken with this post


===

To: "Rob Mueller \(fastmail\)" <robm@fastmail.fm>,
<modperl@apache.org>
From: "Perrin Harkins" <perrin@elem.com>
Subject: Re: Comparison of different caching schemes
Date: Tue, 11 Dec 2001 23:21:53 -0500

> I sat down the other day and wrote a test script to try
> out various caching implementations.

Very interesting.  Looks like Cache::Mmap deserves more attention (and
maybe a Cache::Cache subclass).

> Have I missed something obvious?

Nothing much, but I'd like to see how these numbers vary with the size
of the data being written.  And it would be good if there were a way to
ensure that the data had not been corrupted at the end of a run.

Also, I'd like to see MLDBM + BerkeleyDB (not DB_File) with BerkeleyDB
doing automatic locking, and IPC::MM, and IPC::Shareable, and
IPC::ShareLite (though it doesn't serialize complex data by itself), and
MySQL with standard tables.  Of course I could just do them myself,
since you were kind enough to provide code.

===

To: <modperl@apache.org>
From: "Rob Mueller (fastmail)" <robm@fastmail.fm>
Subject: Re: Comparison of different caching schemes
Date: Wed, 12 Dec 2001 17:23:39 +1100

Just wanted to add an extra thought that I forgot to include
in the previous post.

One important aspect missing from my tests is the actual
concurrency testing. In most real world programs, multiple
applications will be reading from/writing to the cache at
the same time. Depending on the cache synchronisation
scheme, you'll get varying levels of performance
degradation:

1 being worst, 3 being the best.

1. Lock entire cache for a request
   (Cache::SharedMemoryCache, MySQL normal - table locks?)

2. Lock some part of cache for a request (Cache::Mmap
   buckets, MLDBM pages?)

3. Lock only the key/value for a request (Cache::FileCache,
   MySQL InnoDB - row locks?)

Uggg, to do a complete test you really need to generate an
entire modelling system:

1. Number of concurrent processes
2. Average reads/writes to cache per second
3. Ratio of reused/new entries
etc

===

To: "Rob Mueller \(fastmail\)" <robm@fastmail.fm>,
<modperl@apache.org>
From: "Perrin Harkins" <perrin@elem.com>
Subject: Re: Comparison of different caching schemes
Date: Wed, 12 Dec 2001 02:13:36 -0500

> One important aspect missing from my tests is the actual concurrency
testing.

Oh, I guess I should have checked your code.  I thought these were
concurrent.  That makes a huge difference.

> 2. Lock some part of cache for a request
> (Cache::Mmap buckets, MLDBM pages?)

MLDBM::Sync locks the whole thing.

===

To: <modperl@apache.org>
From: "Rob Mueller (fastmail)" <robm@fastmail.fm>
Subject: Re: Comparison of different caching schemes
Date: Wed, 12 Dec 2001 19:20:46 +1100

Some more points.

I'd like to point out that I don't think the lack of actual
concurrency testing is a real problem, at least for most
single CPU installations. If most of the time is spent doing
other stuff in a request (which is most likely the case),
then on average when a process goes to access the cache,
nothing else will be accessing it, so it won't have to block
to wait to get a lock. In that case, one process doing 10*N
accesses is the same as N processes doing 10 accesses, and
the results are still meaningful.

Perrin Harkins pointed out IPC::MM which I've added to the
test code. It's based on the MM library
(http://www.engelschall.com/sw/mm/mm-1.1.3.tar.gz) and
includes a hash and btree tied hash implementation. I've
tried the tied hash and it performs extremely well. It seems
to be limited mostly by the speed of the Storable module.

Package C0 - In process hash
Sets per sec =3D 181181
Gets per sec =3D 138242
Mixes per sec =3D 128501
Package C1 - Storable freeze/thaw
Sets per sec =3D 2856
Gets per sec =3D 7079
Mixes per sec =3D 3728
Package C2 - Cache::Mmap
Sets per sec =3D 810
Gets per sec =3D 2956
Mixes per sec =3D 1185
Package C3 - Cache::FileCache
Sets per sec =3D 392
Gets per sec =3D 813
Mixes per sec =3D 496
Package C4 - DBI with freeze/thaw
Sets per sec =3D 660
Gets per sec =3D 1923
Mixes per sec =3D 885
Package C5 - DBI (use updates with dup) with freeze/thaw
Sets per sec =3D 676
Gets per sec =3D 1866
Mixes per sec =3D 943
Package C6 - MLDBM::Sync::SDBM_File
Sets per sec =3D 340
Gets per sec =3D 1425
Mixes per sec =3D 510
Package C7 - Cache::SharedMemoryCache
Sets per sec =3D 31
Gets per sec =3D 21
Mixes per sec =3D 24
Package C8 - IPC::MM
Sets per sec =3D 2267
Gets per sec =3D 5435
Mixes per sec =3D 2769

===


To: "Rob Mueller (fastmail)" <robm@fastmail.fm>
From: Olivier Poitrey <Olivier@Pas-tres.net>
Subject: Re: Comparison of different caching schemes
Date: Wed, 12 Dec 2001 14:46:28 +0100

On Wed Dec 12, 2001 at 03:05:33PM +1100, Rob Mueller (fastmail) wrote:
> I tried out the following systems.
> * Null reference case (just store in 'in process' hash)
> * Storable reference case (just store in 'in process' hash after 'freeze')
> * Cache::Mmap (uses Storable)
> * Cache::FileCache (uses Storable)
> * DBI (I used InnoDB), use Storable, always do 'delete' then 'insert'
> * DBI, use Storable, do 'select' then 'insert' or 'update'
> * MLDBM::Sync::SDBM_File (uses Storable)
> * Cache::SharedMemoryCache (uses Storable)
[...] 
> Have I missed something obvious?

Hi Rob,

Maybe you can add my "under construction" Apache::Cache module to you 
test suite ? It is availlable on CPAN.

===

To: "Rob Mueller \(fastmail\)" <robm@fastmail.fm>
From: DeWitt Clinton <dewitt@unto.net>
Subject: Re: Comparison of different caching schemes
Date: Wed, 12 Dec 2001 09:21:37 -0500

On Wed, Dec 12, 2001 at 03:05:33PM +1100, Rob Mueller (fastmail) wrote:

> I sat down the other day and wrote a test script to try out various
> caching implementations. The script is pretty basic at the moment, I
> just wanted to get an idea of the performance of different methods.

Rob, wow!  This is fantastic work!

I'm wondering if you could include a test on the Cache::FileBackend
class.  The Cache::FileCache class is not optimized for the particular
type of gets and sets that you were testing for.  In fact, each of
those requests will still need to go through the process of checking
for object expiration, which may add the overhead that you observe in
the benchmark.

Not that I'm expecting Cache::FileBackend to do significantly better
 -- it still does name hashing, directory hashing, taint checking,
cloning of objects, etc.  But I am curious to see what difference it
makes.  You could try Cache::SharedMemoryBackend as well.

In general the Cache::* modules were designed for clarity and ease of
use in mind.  For example, the modules tend to require absolutely no
set-up work on the end user's part and try to be as fail-safe as
possible.  Thus there is run-time overhead involved.  That said, I'm
certainly not against performance.  :) These benchmarks are going to
be tremendously useful in identifying bottlenecks.  However, I won't
be able to optimize for these particular benchmarks, as Cache::Cache
is designed to do something different than straight gets and sets.

Again, thank you, Rob.  This is great,

===

To: "Perrin Harkins" <perrin@elem.com>,
From: "Jeremy Howard" <jh_lists@fastmail.fm>
Subject: Re: Comparison of different caching schemes
Date: Thu, 13 Dec 2001 08:48:06 +1100

Perrin Harkins wrote:
> Also, I'd like to see MLDBM + BerkeleyDB (not DB_File) with BerkeleyDB
> doing automatic locking, and IPC::MM, and IPC::Shareable, and
> IPC::ShareLite (though it doesn't serialize complex data by itself), and
> MySQL with standard tables.  Of course I could just do them myself,
> since you were kind enough to provide code.

IPC::ShareLite freezes/thaws the whole data structure, rather than just the
hash element being accessed, IIRC, so is probably going to have extremely
poor scaling characteristics. Worth adding to check, of course.

MLDBM+BDB without using MLDBM::Sync is a cool idea that I hadn't thought of.
That will certainly be interesting.

Another interesting option is mapping a MySQL table data structure directly
to the data structure being stored. This can be faster because you don't
have to serialise, and also don't have to update the whole structure, just
the fields that change (which is common in session handling and similar).
Unfortunately this is hard to include in this benchmark, because the
benchmark creates random complex structures. Maybe we'll create a custom one
for this option for the sake of reference, even although it won't be
directly comparable.

I'm not sure what a 'standard table' in MySQL is any more... Berkeley,
MyISAM, ISAM... I guess we can try all these, but that's benchmarking the DB
rather than the caching scheme, and we're not about to try every DB server
we can find!


===

To: "Jeremy Howard" <jh_lists@fastmail.fm>,
From: "Perrin Harkins" <perrin@elem.com>
Subject: Re: Comparison of different caching schemes
Date: Wed, 12 Dec 2001 16:55:01 -0500

> IPC::ShareLite freezes/thaws the whole data structure, rather than just
the
> hash element being accessed, IIRC, so is probably going to have extremely
> poor scaling characteristics. Worth adding to check, of course.

No, it's probably not worth it.  It would be worth adding IPC::Shareable
though, because people never believe me when I tell them to use something
else.  Having some numbers would help.

> Another interesting option is mapping a MySQL table data structure
directly
> to the data structure being stored.

That could be useful as part of a comparison for storing non-complex data,
i.e. a single scalar value.

> I'm not sure what a 'standard table' in MySQL is any more... Berkeley,
> MyISAM, ISAM... I guess we can try all these, but that's benchmarking the
DB
> rather than the caching scheme, and we're not about to try every DB server
> we can find!

No, of course not.  It may be that the performance characteristics of these
table types are well known already and I just don't follow the MySQL scene
well enough to know.  I thought maybe the default tables type (MyISAM?)
which doesn't support transactions would have better speed for dirt simple
storage like this.

- Perrin

===

To: "DeWitt Clinton" <dewitt@unto.net>
From: "Rob Mueller (fastmail)" <robm@fastmail.fm>
Subject: Re: Comparison of different caching schemes
Date: Thu, 13 Dec 2001 10:30:03 +1100

> In general the Cache::* modules were designed for clarity and ease of
> use in mind.  For example, the modules tend to require absolutely no
> set-up work on the end user's part and try to be as fail-safe as
> possible.  Thus there is run-time overhead involved.  That said, I'm
> certainly not against performance.  :) These benchmarks are going to
> be tremendously useful in identifying bottlenecks.  However, I won't
> be able to optimize for these particular benchmarks, as Cache::Cache
> is designed to do something different than straight gets and sets.
>
> Again, thank you, Rob.  This is great,

That's a good point. I probably should have added the features that each one
can do to help with decisions. Cache::Cache does have the most options with
regard to limiting time/size in the cache, so that could be a be factor in
someones choice.

> * Cache::Mmap (uses Storable)
- Can indirectly specify the maximum cache size, though purges are uneven
depending on how well data hashes into different buckets
- Has callback ability on a read/purge so you can move any purged data to a
different data store if you want, and automatically retrieve it on next
retrieve when it's not in the cache

> * Cache::FileCache (uses Storable)
> * Cache::SharedMemoryCache (uses Storable)
- Can specify the maximum cache size (Cache::SizeAwareFileCache) and/or
maximum time an object is allowed in the cache
- Follows the Cache::Cache interface system

> * DBI (I used InnoDB), use Storable, always do 'delete' then 'insert'
> * DBI, use Storable, do 'select' then 'insert' or 'update'
- Can't specifiy any limits directly
- Could add a 'size' and 'timestamp' column to each row and use a daemon to
iterate through and cleanup based on time and size

> * MLDBM::Sync::SDBM_File (uses Storable)

> * IPC::MM
- Can't specifiy any limits directly
- Could create secondary tied db/mm hash with key -> [ size, timestamp ]
mapping and use daemon to iterate through and cleanup based on time and size

Rob


===

To: "Rob Mueller \(fastmail\)" <robm@fastmail.fm>
From: "Rob Bloodgood" <robb@empire2.com>
Subject: RE: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 10:43:02 -0800

> > Again, thank you, Rob.  This is great,
>
> > * Cache::FileCache (uses Storable)
> > * Cache::SharedMemoryCache (uses Storable)
> - Can specify the maximum cache size (Cache::SizeAwareFileCache) and/or
> maximum time an object is allowed in the cache
> - Follows the Cache::Cache interface system

I run / wrote ExitExchange (www.exitexchange.com).

I was using Cache::SharedMemoryCache on my system.  I figured, "Hey, it's
RAM, right?  It's gonna be WAY faster than anything disk-based." (I was
originally using IPC::Shareable and the caching model from the Eagle, but
IPC::Shareable broke after like, 0.50 or something so it makes NEW shm
segments with EVERY write.)

Then I saw your benchmarks... boy was I wrong!

My statistics system (count.exitexchange.com) is a dual PIII/1GHz, w/ 2GB of
RAM.  I'm serving approximately 2 million server-parsed, mod_perl generated
hits/day.  Until the other day, the system load average was usually between
12 and 21 (!), and response times were in the 3-10 second range. <sigh>
I've been sweating and worrying about optimizations in an entirely different
part of the system than the webserver, figuring I missed something in THAT
part...

Based on your benchmarks, I simply s/SharedMemoryCache/FileCache/ and tried
it again.

For the past two days, I've been at .25 second response times (and that's
including the overhead of loading and executing the app that makes the req)
and server load is < 1, consistently.

In short, your work has saved my ass.

THANKYOUTHANKYOUTHANKYOUTHANKYOU
and Merry Christmas!

L8r,
Rob

#!/usr/bin/perl -w
use Disclaimer qw/:standard/;


===

To: Rob Bloodgood <robb@empire2.com>
From: Paul Lindner <lindner@inuus.com>
Subject: Re: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 11:35:31 -0800

On Fri, Dec 14, 2001 at 10:43:02AM -0800, Rob Bloodgood wrote:
> > > Again, thank you, Rob.  This is great,
> >
> > > * Cache::FileCache (uses Storable)
> > > * Cache::SharedMemoryCache (uses Storable)
> > - Can specify the maximum cache size (Cache::SizeAwareFileCache) and/or
> > maximum time an object is allowed in the cache
> > - Follows the Cache::Cache interface system
> 
> I was using Cache::SharedMemoryCache on my system.  I figured, "Hey, it's
> RAM, right?  It's gonna be WAY faster than anything disk-based." (I was
> originally using IPC::Shareable and the caching model from the Eagle, but
> IPC::Shareable broke after like, 0.50 or something so it makes NEW shm
> segments with EVERY write.)

Another powerful tool for tracking down performance problems is perl's
profiler combined with Devel::DProf and Apache::DProf.  Devel::DProf
is bundled with perl. Apache::DProf is hidden in the Apache-DB package
on CPAN.

Using that combo made me realize that the real performance problem in
my code was parsing dates..  I ended up switching to using mod_perl's
date header parser, and used the slow perl method for dates not
formatted in http format.

At the same time I added some code to track the time it takes to
process a request using Time::HiRes.  This value is set as a note via
$r->note('REQTIME').  A customlog directive takes care of dumping that
value in the logs...

===

To: "Rob Bloodgood" <robb@empire2.com>,
From: "Perrin Harkins" <perrin@elem.com>
Subject: Re: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 15:20:15 -0500

> I was using Cache::SharedMemoryCache on my system.  I figured, "Hey, it's
> RAM, right?  It's gonna be WAY faster than anything disk-based."

The thing you were missing is that on an OS with an aggressively caching
filesystem (like Linux), frequently read files will end up cached in RAM
anyway.  The kernel can usually do a better job of managing an efficient
cache than your program can.

For what it's worth, DeWitt Clinton accompanied his first release of
File::Cache (the precursor to Cache::FileCache) with a benchmark showing
this same thing.  That was the reason File::Cache was created.

And ++ on Paul's comments about Devel::DProf and other profilers.

===

To: <lindner@inuus.com>
From: "Rob Bloodgood" <robb@empire2.com>
Subject: RE: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 12:41:04 -0800

> Another powerful tool for tracking down performance problems is perl's
> profiler combined with Devel::DProf and Apache::DProf.  Devel::DProf
> is bundled with perl. Apache::DProf is hidden in the Apache-DB package
> on CPAN.

Ya know the place in my original comment where I was optimizing a different
subsystem?  I just discovered Devel::DProf last week (after 5 *years* of
perl... <smacks forehead>), and was using that.  *AND* had improved a sore
spot's performance by 10% without even working hard, because of profiling.
Point taken.

> At the same time I added some code to track the time it takes to
> process a request using Time::HiRes.  This value is set as a note via
> $r->note('REQTIME').  A customlog directive takes care of dumping that
> value in the logs...

Hmm... I was already logging a status message via warn(), I did the SAME
TRICK but stored it in a local variable because I didn't need to go as far
as a customlog...

Sounds like great minds think alike! :-)


===

To: mod_perl list <modperl@apache.org>
From: Dave Rolsky <autarch@urth.org>
Subject: Re: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 14:59:53 -0600 (CST)

On Fri, 14 Dec 2001, Perrin Harkins wrote:

> The thing you were missing is that on an OS with an aggressively caching
> filesystem (like Linux), frequently read files will end up cached in RAM
> anyway.  The kernel can usually do a better job of managing an efficient
> cache than your program can.

Plus for some reason IPC overhead seems to seriously degrade as the size
of the overall shared memory segment increases.  I'm not sure why this is
but I think the docs for IPC::Shareable mention this.  Maybe for _very_
small amounts of data, shared memory might still be a win.

===


To: mod_perl list <modperl@apache.org>
From: Jay Thorne <jay@grue.userfriendly.org>
Subject: Re: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 14:55:24 -0800

On December 14, 2001 12:59 pm, Dave Rolsky wrote:
> On Fri, 14 Dec 2001, Perrin Harkins wrote:
> > The thing you were missing is that on an OS with an aggressively caching
> > filesystem (like Linux), frequently read files will end up cached in RAM
> > anyway.  The kernel can usually do a better job of managing an efficient
> > cache than your program can.
>
> Plus for some reason IPC overhead seems to seriously degrade as the size
> of the overall shared memory segment increases.  I'm not sure why this is
> but I think the docs for IPC::Shareable mention this.  Maybe for _very_
> small amounts of data, shared memory might still be a win.
>

Very interesting. We at UF were using IPC caching, in earlier versions of our 
comment system. We turfed it for several reasons.

 - Locking and storage were taking appreciable wall clock time.
 - We had a large memory footprint anyway, and a per-process hash did not 
actually add all that much.
 - IPC was persistent over server instances, but we already had a dbi 
connection, and that is persistent over system reboots as well. 

The earlier version was topping out at about 8 hits per second.

So our solution was caching in-process with just a hash, and using a 
DBI/mysql persistent store. 
in pseudo code
sub get_stuff {
	if (! $cache{$whatever} ) {
		if !( $cache{whatever} = dbi_lookup()) {
			$cache{$whatever}=derive_data_from_original_source($whatever);
			dbi_save($cache_whatever);
		}
	}
	$cache{$whatever}
}

Thanks to that and another local content cache implemented with the file 
system with time based aging, UF's latest version of the system can do over 
100 page views per second on a relatively modest machine (P3 700, 256M IDE, 
linux 2.2.x)

The filesystem based / time sensitive aging is a nice little thing we should 
put up in CPAN. We've just not done so yet.

===

To: <jay@grue.userfriendly.org>, "mod_perl list"
<modperl@apache.org>
From: "Perrin Harkins" <perrin@elem.com>
Subject: Re: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 18:04:41 -0500

> So our solution was caching in-process with just a hash, and using a
> DBI/mysql persistent store.
> in pseudo code
> sub get_stuff {
> if (! $cache{$whatever} ) {
> if !( $cache{whatever} = dbi_lookup()) {
> $cache{$whatever}=derive_data_from_original_source($whatever);
> dbi_save($cache_whatever);
> }
> }
> $cache{$whatever}
> }

That's actually a bit different.  That would fail to notice updates between
processes until the in-memory cache was cleared.  Still very useful for
read-only data or data that can be out of sync for some period though.

> The filesystem based / time sensitive aging is a nice little thing we
should
> put up in CPAN. We've just not done so yet.

How does it differ from the other solutions like Cache::FileCache?  Is it
something you could add to an existing module?

===

To: "Perrin Harkins" <perrin@elem.com>, "mod_perl list"
<modperl@apache.org>
From: Jay Thorne <jay@grue.userfriendly.org>
Subject: Re: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 15:20:33 -0800

On December 14, 2001 03:04 pm, Perrin Harkins wrote:
> > So our solution was caching in-process with just a hash, and using a
> > DBI/mysql persistent store.
> > in pseudo code
> > sub get_stuff {
> > if (! $cache{$whatever} ) {
> > if !( $cache{whatever} = dbi_lookup()) {
> > $cache{$whatever}=derive_data_from_original_source($whatever);
> > dbi_save($cache_whatever);
> > }
> > }
> > $cache{$whatever}
> > }
>
> That's actually a bit different.  That would fail to notice updates between
> processes until the in-memory cache was cleared.  Still very useful for
> read-only data or data that can be out of sync for some period though.
>
> > The filesystem based / time sensitive aging is a nice little thing we
>
> should
>
> > put up in CPAN. We've just not done so yet.
>
> How does it differ from the other solutions like Cache::FileCache?  Is it
> something you could add to an existing module?
>

Performance,and key structure, semantics. Its mostly different, not 
better/worse.

=pod

=head1 NAME

UFMEDIA::CacheOneFile - cache a scalar value in a file on disk

=head1 SYNOPSIS

 use UFMEDIA::CacheOneFile;

 my $cache = new UFMEDIA::CacheOneFile(
     cache_file => '/var/cache/myapp/flurge.cache',
     max_age    => 30,
     refill_sub => sub { recalculate_flurges(31337, 'blue', 42) },
 );

 my $value = $cache->get_value;

=head1 DESCRIPTION

UFMEDIA::CacheOneFile enables you to cache a single scalar value in a file
on disk.  Given a filename under a writable directory, a maximum age, and
a reference to a refill subroutine, a cache object will cache the result
of the refill subroutine in the file the first time B<get_value()> is called,
and use the cached value for subsequent calls to B<get_value()> until 
B<max_age> the cach
e file is more than B<max_age> seconds old.

If multiple processes share a single cache file, the first process that
reads the cache file after it has expired will take responsibility for
replacing it with an up-to-date copy.  Other processes needing up-to-date
information will wait for this to finish and will then use the new value.

=head1 AUTHOR

Mike Lyons <lyonsm@userfriendly.org>

=head1 SEE ALSO

perl(1).

=cut


===



To: "mod_perl list" <modperl@apache.org>
From: Robert Landrum <rlandrum@capitoladvantage.com>
Subject: Re: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 18:53:47 -0500

At 6:04 PM -0500 12/14/01, Perrin Harkins wrote:
>That's actually a bit different.  That would fail to notice updates between
>processes until the in-memory cache was cleared.  Still very useful for
>read-only data or data that can be out of sync for some period though.


The primary problem with everything mentioned thus far is that they 
are almost entirely based around the concept of a single server. 
Caching schemes that are both fast and work across multiple servers 
and process instances are very hard to find.  After reading the eToys 
article, I decided that BerkeleyDB was worth a look.  We discovered 
that it was very fast and would make a great cache for some of our 
commonly used database queries.  The problem was that we are running 
on 5 different servers, load balanced by Big/IP.  Again, taking my 
que from the eToys article, I began working on a data exchange system 
(using perl instead of C) which multicasts data packets to the 
"listening" servers.  So far, our only problems have been with the 
deadlocking (solved with db_deadlock), and a few corrupt records.

I'm considering uploading to CPAN the stuff I've written for the 
caching, but haven't had the time to make it generic enough.  I also 
haven't performed any benchmarks other than the "Wow, that's a lot 
faster" benchmark.  One limitation to the stuff I've written is that 
the daemons (listeners) are non threaded, non forking.  That and it's 
all based on UDP.


Rob

===

To: Robert Landrum <rlandrum@capitoladvantage.com>,
From: Jay Thorne <jay@grue.userfriendly.org>
Subject: Re: Comparison of different caching schemes
Date: Fri, 14 Dec 2001 16:37:12 -0800

On December 14, 2001 03:53 pm, Robert Landrum wrote:
> At 6:04 PM -0500 12/14/01, Perrin Harkins wrote:
> >That's actually a bit different.  That would fail to notice updates
> > between processes until the in-memory cache was cleared.  Still very
> > useful for read-only data or data that can be out of sync for some period
> > though.
>
> The primary problem with everything mentioned thus far is that they
> are almost entirely based around the concept of a single server.
> Caching schemes that are both fast and work across multiple servers
> and process instances are very hard to find.  After reading the eToys
> article, I decided that BerkeleyDB was worth a look.  We discovered
> that it was very fast and would make a great cache for some of our
> commonly used database queries.  The problem was that we are running
> on 5 different servers, load balanced by Big/IP.  Again, taking my
> que from the eToys article, I began working on a data exchange system
> (using perl instead of C) which multicasts data packets to the
> "listening" servers.  So far, our only problems have been with the
> deadlocking (solved with db_deadlock), and a few corrupt records.
>

We did stuff based on kind of an "Acceptable differences" concept.

Using round robinn dns, with predominantly winXX clients, you tend to get the 
same IP address from one request to tthe next. Changing, IIRC, on the dns 
cache timeout, from the soa record.  With a bigIP or something, you can 
guarantee this user locality.

So that leaves you with *appearing* to be in sync for the one local machine.

Using a database with common connections from multiple machines or the "one 
writer db machine, many read db machines" concept, using replication or what 
have you for updates, you can present a pretty consistent database view, with 
a maximum difference on the read side of the replication time.

We found that the database concept, plus using a local cache with a timeout 
of 5 seconds not only gave us a huge performance increase, but it gave us a 
maximum of about 6 seconds of data difference between machines. And since 
your average human viewer hits pages no faster than once every 20 seconds or 
so, it was pretty much invisible to the individual user. Of course search 
engines will be faster than this, but does being consistent for search 
engines matter all that much?



====

To: "Perrin Harkins" <perrin@elem.com>, "Rob Bloodgood"
<robb@empire2.com>
From: "Rob Mueller (fastmail)" <robm@fastmail.fm>
Subject: Re: Comparison of different caching schemes
Date: Sat, 15 Dec 2001 11:57:23 +1100

> The thing you were missing is that on an OS with an aggressively caching
> filesystem (like Linux), frequently read files will end up cached in RAM
> anyway.  The kernel can usually do a better job of managing an efficient
> cache than your program can.
>
> For what it's worth, DeWitt Clinton accompanied his first release of
> File::Cache (the precursor to Cache::FileCache) with a benchmark showing
> this same thing.  That was the reason File::Cache was created.

While that's true, there are still some problems with a file cache. Namely
to get reasonably performance you have to do the directory hashing structure
so that you don't end up with too many files (one for each key) in one
directory. Thus for every add to the cache you have to:
* stat each directory in the hash path and create it if it doesn't exist
* open and create the file and write to it, close the file

A similar method is required for reading. All that still takes a bit of
time. This is where having some shared memory representation can be a really
help since you don't have to traverse, open, read/write, close a file every
time. Witness the performance of IPC::MM which seems to be mostly limited by
the performance of the Storable module. I'm planning on doing another test
which just stores some data without the 'streaming' to see which examples
are really limited by Storable and which by their implementation, this might
be useful for some people.

> And ++ on Paul's comments about Devel::DProf and other profilers.

Ditto again. I've been using Apache::DProf recently and it's been great at
tracking down exactly where time is spent in my program. If you have any
performance problems, definitely use it first before making any assumptions.

One question though, I have a call tree that looks a bit like:
main
-> calls f1
       -> calls f2
-> calls f3
       -> calls f2

The two calls to f2 may take completely different times. Using 'dprofpp -I',
I can see what percentage of overall time is spent in 'f1 and children', 'f3
and children' and 'f2 and children'. But I can't see an easy way to tell
'time in f2 and children when f2 was called from f1' (or ditto for f3). Does
that make sense?

Rob


====

To: "Rob Mueller (fastmail)" <robm@fastmail.fm>,
From: "Jeremy Howard" <jh_lists@fastmail.fm>
Subject: Re: Comparison of different caching schemes
Date: Sun, 16 Dec 2001 09:58:09 +1100

Rob Mueller (fastmail) wrote:
> > And ++ on Paul's comments about Devel::DProf and other profilers.
>
> Ditto again. I've been using Apache::DProf recently and it's been great at
> tracking down exactly where time is spent in my program.

One place that Rob and I still haven't found a good solution for profiling
is trying to work out whether we should be focussing on optimising our
mod_perl code, or our IMAP config, or our MySQL DB, or our SMTP setup, or
our daemons' code, or...

Can anyone suggest a way (under Linux 2.4, if it's OS dependent) to get a
log of CPU (and also IO preferably) usage by process name over some period
of time?

Sorry to move slightly off-topic here, but in optimising our mod_perl app we
have to work out which bit of the whole system to prioritise.


===

To: Jeremy Howard <jh_lists@fastmail.fm>
From: Luciano Miguel Ferreira Rocha <strange@nsk.yi.org>
Subject: Re: Comparison of different caching schemes
Date: Sun, 16 Dec 2001 00:06:04 +0000

On Sun, Dec 16, 2001 at 09:58:09AM +1100, Jeremy Howard wrote:
> Can anyone suggest a way (under Linux 2.4, if it's OS dependent) to get a
> log of CPU (and also IO preferably) usage by process name over some period
> of time?

What about BSD Process Accounting (supported in most *nix systems) and
sysstat (iostat + sar)?


===
To: "Jeremy Howard" <jh_lists@fastmail.fm>,
From: "Perrin Harkins" <perrin@elem.com>
Subject: Re: Comparison of different caching schemes
Date: Sat, 15 Dec 2001 20:57:30 -0500

> One place that Rob and I still haven't found a good solution for
profiling
> is trying to work out whether we should be focussing on optimising our
> mod_perl code, or our IMAP config, or our MySQL DB, or our SMTP setup,
or
> our daemons' code, or...

Assuming that the mod_perl app is the front-end for users and that
you're trying to optimize for speed of responses, you should just use
DProf to tell you which subroutines are using the most wall clock time.
(I think it's dprofpp -t or something.  Check the man page.)  If the sub
that uses IMAP is the slowest, then work on speeding up your IMAP server
or the way you access it.

CPU utilization may not be all that telling, since database stuff often
takes the longest but doesn't burn much CPU.

===

To: Perrin Harkins <perrin@elem.com>
From: Paul Lindner <lindner@inuus.com>
Subject: Re: Comparison of different caching schemes
Date: Sat, 15 Dec 2001 22:31:20 -0800

On Sat, Dec 15, 2001 at 08:57:30PM -0500, Perrin Harkins wrote:
> > One place that Rob and I still haven't found a good solution for
> profiling
> > is trying to work out whether we should be focussing on optimising our
> > mod_perl code, or our IMAP config, or our MySQL DB, or our SMTP setup,
> or
> > our daemons' code, or...
> 
> Assuming that the mod_perl app is the front-end for users and that
> you're trying to optimize for speed of responses, you should just use
> DProf to tell you which subroutines are using the most wall clock time.
> (I think it's dprofpp -t or something.  Check the man page.)  If the sub
> that uses IMAP is the slowest, then work on speeding up your IMAP server
> or the way you access it.

> CPU utilization may not be all that telling, since database stuff often
> takes the longest but doesn't burn much CPU.

I agree 100%, wall clock time is the best metric.  Consider that most
apps are not CPU bound, they are I/O or network bound.  Wall clock
time measures that exactly.

Another useful tool is truss/strace.  Start up your server single
process (-X) and then attach to it with strace.  The -c and -r flags
to strace are quite handy..  For example here's the cumulative stats
for 'ls'

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.29    0.004508         751         6           read
 19.48    0.001679         560         3           write
 12.93    0.001115          56        20           close
  6.01    0.000518          23        23           old_mmap
  5.45    0.000470          21        22         2 open
  1.22    0.000105          21         5           munmap
  0.93    0.000080          40         2           getdents64
  0.71    0.000061           3        21           fstat64
  0.29    0.000025           4         7           brk
  0.24    0.000021          11         2           mprotect
  0.19    0.000016           8         2           ioctl
  0.12    0.000010          10         1           uname
  0.09    0.000008           3         3         2 rt_sigaction
  0.06    0.000005           5         1           fcntl64
 ------ ----------- ----------- --------- --------- ----------------
100.00    0.008621                   118         4 total

===

To: <modperl@apache.org>
From: Bill Moseley <moseley@hank.org>
Subject: Re: Comparison of different caching schemes
Date: Tue, 18 Dec 2001 14:01:21 -0800

Ok, I'm a bit slow...


At 03:05 PM 12/12/01 +1100, Rob Mueller (fastmail) wrote: 

>>>>

<excerpt>Just thought people might be interested...

</excerpt><<<<<<<<

Seems like they were!  Thanks again.


I didn't see anyone comment about this, but I was a bit surprised by
MySQLs good performance.  I suppose caching is key.  I wonder if things
would change with 50 or 100 thousand rows in the table.


I always assumed something like Cache::FileCache would have less overhead
than a RDMS.  It's impressive.



>>>>

<excerpt>

  

Now to the results, here they are.

 Package C0 - In process hash

Sets per sec = 147116

Gets per sec = 81597

Mixes per sec = 124120

Package C1 - Storable freeze/thaw

Sets per sec = 2665

Gets per sec = 6653

Mixes per sec = 3880

Package C2 - Cache::Mmap

Sets per sec = 809

Gets per sec = 3235

Mixes per sec = 1261

Package C3 - Cache::FileCache

Sets per sec = 393

Gets per sec = 831

Mixes per sec = 401

Package C4 - DBI with freeze/thaw

Sets per sec = 651

Gets per sec = 1648

Mixes per sec = 816

Package C5 - DBI (use updates with dup) with freeze/thaw

Sets per sec = 657

Gets per sec = 1994

Mixes per sec = 944

Package C6 - MLDBM::Sync::SDBM_File

Sets per sec = 334

Gets per sec = 1279

Mixes per sec = 524

Package C7 - Cache::SharedMemoryCache

Sets per sec = 42

Gets per sec = 29

Mixes per sec = 32

  

</excerpt>



===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu