modperl_cache_locking_issues

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

To: modperl@apache.org
From: "Alexander Farber (EED)" <eedalf@eed.ericsson.se>
Subject: Re: mod_perl shared memory with MM
Date: Mon, 05 Mar 2001 10:52:38 +0100

Adi Fairbank wrote:

> Yeah, I was thinking about something like that at first,
> but I've never played with named pipes, and it didn't
> sound too safe after reading the perlipc man page.  What
> do you use, Perl open() calls, IPC::Open2/3,
> IPC::ChildSafe, or

IPC:ChildSafe is a good module, I use it here to access ClearCase, but 
it probably won't help you to exchange any data between Apache children

===

To: modperl@apache.org
From: Christian Jaeger <christian.jaeger@sl.ethz.ch>
Subject: Re: mod_perl shared memory with MM
Date: Sat, 10 Mar 2001 02:13:09 +0100

For all of you trying to share session information efficently my 
IPC::FsSharevars module might be the right thing. I wrote it after 
having considered all the other solutions. It uses the file system 
directly (no BDB/etc. overhead) and provides sophisticated locking 
(even different variables from the same session can be written at the 
same time). I wrote it for my fastcgi based web app framework (Eile) 
but it should be useable for mod_perl things as well (I'm awaiting 
patches and suggestions in case it is not). It has not seen very much 
real world testing yet.

You may find the manpage on 
http://testwww.ethz.ch/perldoc/IPC/FsSharevars.pm and the module (no 
Makefile.PL yet) under http://testwww.ethz.ch/eile/download/ .


===

To: modperl@apache.org
From: Christian Jaeger <christian.jaeger@sl.ethz.ch>
Subject: Re: mod_perl shared memory with MM
Date: Sat, 10 Mar 2001 02:13:09 +0100

For all of you trying to share session information efficently my 
IPC::FsSharevars module might be the right thing. I wrote it after 
having considered all the other solutions. It uses the file system 
directly (no BDB/etc. overhead) and provides sophisticated locking 
(even different variables from the same session can be written at the 
same time). I wrote it for my fastcgi based web app framework (Eile) 
but it should be useable for mod_perl things as well (I'm awaiting 
patches and suggestions in case it is not). It has not seen very much 
real world testing yet.

You may find the manpage on 
http://testwww.ethz.ch/perldoc/IPC/FsSharevars.pm and the module (no 
Makefile.PL yet) under http://testwww.ethz.ch/eile/download/ .

Cheers
Christian.


===

To: DeWitt Clinton <dewitt@unto.net>, Perrin Harkins
<perrin@primenet.com>
From: Christian Jaeger <christian.jaeger@sl.ethz.ch>
Subject: Re: mod_perl shared memory with MM
Date: Sun, 11 Mar 2001 15:33:12 +0100

At 22:23 Uhr -0500 10.3.2001, DeWitt Clinton wrote:
>On Sat, Mar 10, 2001 at 04:35:02PM -0800, Perrin Harkins wrote:
>  > Christian Jaeger wrote:
>>  > Yes, it uses a separate file for each variable. This way also locking
>>  > is solved, each variable has it's own file lock.
>>
>>  You should take a look at DeWitt Clinton's Cache::FileCache module,
>>  announced on this list.  It might make sense to merge your work into
>>  that module, which is the next generation of the popular File::Cache
>>  module.
>
>Yes!  I'm actively looking for additional developers for the Perl
>Cache project.  I'd love new implementations of the Cache interface.
>Cache::BerkeleyDBCache would be wonderful.  Check out:
>
>   http://sourceforge.net/projects/perl-cache/
>
>For what it is worth, I don't explicitly lock.  I do atomic writes
>instead, and have yet to hear anyone report a problem in the year the
>code has been public.


I've looked at Cache::FileCache now and think it's (currently) not 
possible to use for IPC::FsSharevars:

I really miss locking capabilities. Imagine a script that reads a 
value at the beginning of a request and writes it back at the end of 
the request. If it's not locked during this time, another instance 
can read the same value and then write another change back which is 
then overwritten by the first instance.

IPC::FsSharevars even goes one step further: instead of locking 
everything for a particular session, it only locks individual 
variables. So you can say "I use the variables $foo and %bar from 
session 12345 and will write %bar back", in which case %bar of 
session 12345 is locked until it is written back, while $foo and @baz 
are still unlocked and may be read (and written) by other instances. 
:-) Such behaviour is useful if you have framesets where a browser 
may request several frames of the same session in parallel (you can 
see an example on http://testwww.ethz.ch, click on 'Suche' then on 
the submit button, the two appearing frames are executed in parallel 
and both access different session variables), or for handling session 
independant (global) data.

One thing to be careful about in such situations is dead locking. 
IPC::FsSharevars prevents dead locks by getting all needed locks at 
the same time (this is done by first requesting a general session 
lock and then trying to lock all needed variable container files - if 
it fails, the session lock is freed again and the process waits for a 
unix signal indicating a change in the locking states). Getting all 
locks at the same time is more efficient than getting locks always in 
the same order.


BTW some questions/suggestions for DeWitt:
- why don't you use 'real' constants for $SUCCESS and the like? (use constant)
- you probably should either append the userid of the process to 
/tmp/FileCache or make this folder globally writeable (and set the 
sticky flag). Otherwise other users get a permission error.
- why don't you use Storable.pm? It should be much faster than Data::Dumper

>I have some preliminary benchmark code -- only good for relative
>benchmarking, but it is a start.  I'd be happy to post the results
>here if people are interested.

Could you send me the code?, then I'll look into benchmarking my module too.

===

To: Christian Jaeger <christian.jaeger@sl.ethz.ch>
From: Perrin Harkins <perrin@primenet.com>
Subject: Re: mod_perl shared memory with MM
Date: Sat, 10 Mar 2001 00:23:34 -0800 (PST)

On Sat, 10 Mar 2001, Christian Jaeger wrote:
> For all of you trying to share session information efficently my 
> IPC::FsSharevars module might be the right thing. I wrote it after 
> having considered all the other solutions. It uses the file system 
> directly (no BDB/etc. overhead) and provides sophisticated locking 
> (even different variables from the same session can be written at the 
> same time).

Sounds very interesting.  Does it use a multi-file approach like
File::Cache?  Have you actually benchmarked it against BerkeleyDB?  It's
hard to beat BDB because it uses a shared memory buffer, but theoretically
the file system buffer could do it since that's managed by the kernel.

===

To: Perrin Harkins <perrin@primenet.com>
From: Christian Jaeger <christian.jaeger@sl.ethz.ch>
Subject: Re: mod_perl shared memory with MM
Date: Sat, 10 Mar 2001 12:51:34 +0100

At 0:23 Uhr -0800 10.3.2001, Perrin Harkins wrote:
>On Sat, 10 Mar 2001, Christian Jaeger wrote:
>>  For all of you trying to share session information efficently my
>>  IPC::FsSharevars module might be the right thing. I wrote it after
>>  having considered all the other solutions. It uses the file system
>>  directly (no BDB/etc. overhead) and provides sophisticated locking
>>  (even different variables from the same session can be written at the
>>  same time).
>
>Sounds very interesting.  Does it use a multi-file approach like
>File::Cache?  Have you actually benchmarked it against BerkeleyDB?  It's
>hard to beat BDB because it uses a shared memory buffer, but theoretically
>the file system buffer could do it since that's managed by the kernel.

Yes, it uses a separate file for each variable. This way also locking 
is solved, each variable has it's own file lock.

It's a bit difficult to write a realworld benchmark. I've tried to 
use DB_File before but it was very slow when doing a sync after every 
write as is recommended in various documentation to make it 
multiprocess safe. What do you mean with BerkeleyDB, something 
different than DB_File?

Currently I don't use Mmap (are there no cross platform issues using 
that?), that might speed it up a bit more.

Christian.


===

To: Christian Jaeger <christian.jaeger@sl.ethz.ch>
From: Perrin Harkins <perrin@primenet.com>
Subject: Re: mod_perl shared memory with MM
Date: Sat, 10 Mar 2001 16:35:02 -0800

Christian Jaeger wrote:
> Yes, it uses a separate file for each variable. This way also locking
> is solved, each variable has it's own file lock.

You should take a look at DeWitt Clinton's Cache::FileCache module,
announced on this list.  It might make sense to merge your work into
that module, which is the next generation of the popular File::Cache
module.

> It's a bit difficult to write a realworld benchmark.

It certainly is.  Benchmarking all of the options is something that I've
always wanted to do and never find enough time for.

> I've tried to
> use DB_File before but it was very slow when doing a sync after every
> write as is recommended in various documentation to make it
> multiprocess safe. What do you mean with BerkeleyDB, something
> different than DB_File?

BerkeleyDB.pm is an interface to later versions of the Berkeley DB
library.  It has a shared memory cache, and does not require syncing or
opening and closing of files on every access.  It has built-in locking,
which can be configured to work at a page level, allowing mutiple
simultaneous writers.

> Currently I don't use Mmap (are there no cross platform issues using
> that?), that might speed it up a bit more.

That would be a nice option.  Take a look at Cache::Mmap before you
start.

- Perrin

===

To: Perrin Harkins <perrin@primenet.com>
From: DeWitt Clinton <dewitt@unto.net>
Subject: Re: mod_perl shared memory with MM
Date: Sat, 10 Mar 2001 22:23:50 -0500

On Sat, Mar 10, 2001 at 04:35:02PM -0800, Perrin Harkins wrote:
> Christian Jaeger wrote:
> > Yes, it uses a separate file for each variable. This way also locking
> > is solved, each variable has it's own file lock.
> 
> You should take a look at DeWitt Clinton's Cache::FileCache module,
> announced on this list.  It might make sense to merge your work into
> that module, which is the next generation of the popular File::Cache
> module.

Yes!  I'm actively looking for additional developers for the Perl
Cache project.  I'd love new implementations of the Cache interface.
Cache::BerkeleyDBCache would be wonderful.  Check out:
  
  http://sourceforge.net/projects/perl-cache/

For what it is worth, I don't explicitly lock.  I do atomic writes
instead, and have yet to hear anyone report a problem in the year the
code has been public.


> > It's a bit difficult to write a realworld benchmark.
> 
> It certainly is.  Benchmarking all of the options is something that I've
> always wanted to do and never find enough time for.

I have some preliminary benchmark code -- only good for relative
benchmarking, but it is a start.  I'd be happy to post the results
here if people are interested.

===

To: Christian Jaeger <christian.jaeger@sl.ethz.ch>
From: DeWitt Clinton <dewitt@unto.net>
Subject: [OT] Re: mod_perl shared memory with MM
Date: Sun, 11 Mar 2001 11:13:44 -0500

On Sun, Mar 11, 2001 at 03:33:12PM +0100, Christian Jaeger wrote:

> I've looked at Cache::FileCache now and think it's (currently) not 
> possible to use for IPC::FsSharevars:
> 
> I really miss locking capabilities. Imagine a script that reads a 
> value at the beginning of a request and writes it back at the end of 
> the request. If it's not locked during this time, another instance 
> can read the same value and then write another change back which is 
> then overwritten by the first instance.


I'm very intrigued by your thinking on locking.  I had never
considered the transaction based approach to caching you are referring
to.  I'll take this up privately with you, because we've strayed far
off the mod_perl topic, although I find it fascinating.



> - why don't you use 'real' constants for $SUCCESS and the like? (use
> constant)

Two reasons, mostly historical, and not necessarily good ones.

One, I benchmarked some code once that required high performance, and
the use of constants was just slightly slower.

Two, I like the syntax $hash{$CONSTANT}.  If I remember correctly,
$hash{CONSTANT} didn't work.  This may have changed in newer versions
of Perl.

Obviously those are *very* small issues, and so it is mostly by habit
that I don't use constant.  I would consider changing, but it would
mean asking everyone using the code to change too, because they
currently import and use the constants as Exported scalars.

Do you know of a very important reason to break compatibility and
force the switch?  I'm not opposed to switching if I have to, but I'd
like to minimize the impact on the users.



> - you probably should either append the userid of the process to 
> /tmp/FileCache or make this folder globally writeable (and set the 
> sticky flag). Otherwise other users get a permission error.

As of version 0.03, the cache directories, but not the cache entries,
are globally writable by default.  Users can override this by changing
the 'directory_umask' option, or keep data private altogether by
changing the 'cache_root'.  What version did you test with?  There may
be a bug in there.



> - why don't you use Storable.pm? It should be much faster than Data::Dumper

The TODO contains "Replace Data::Dumper with Storable (maybe)".  :) 

The old File::Cache module used Storable, btw.

It will be trivial to port the new Cache::FileCache to use Storable.
I simply wanted to wait until I had the benchmarking code so I could
be sure that Storeable was faster.  Actually, I'm not 100% sure that I
expect Storeable to be faster than Data::Dumper.  If Data::Dumper
turns out to be about equally fast, then I'll stay with it, because it
is available on all Perl installations, I believe.

Do you know if Storeable is definitely faster?  If you have benchmarks
then I am more than happy to switch now.  Or, do you know of a reason,
feature wise, that I should switch?  Again, it is trivial to do so.



> >I have some preliminary benchmark code -- only good for relative
> >benchmarking, but it is a start.  I'd be happy to post the results
> >here if people are interested.
> 
> Could you send me the code?, then I'll look into benchmarking my
> module too.

I checked it in as Cache::CacheBenchmark.  It isn't good code, nor
does it necessarily work just yet.  I simply checked it in while I was
in the middle of working on it.  I'm turning it into a real
benchmarking class for the cache, and hopefully that will help you a
little bit.


===

To: "DeWitt Clinton" <dewitt@unto.net>,
From: "Perrin Harkins" <perrin@primenet.com>
Subject: Re: [OT] Re: mod_perl shared memory with MM
Date: Sun, 11 Mar 2001 10:24:04 -0800

> I'm very intrigued by your thinking on locking.  I had never
> considered the transaction based approach to caching you are referring
> to.  I'll take this up privately with you, because we've strayed far
> off the mod_perl topic, although I find it fascinating.

One more suggestion before you take this off the list: it's nice to have
both.  There are uses for explicit locking (I remember Randal saying he
wished File::Cache had some locking support), but most people will be happy
with atomic updates, and that's usually faster.  Gunther's eXtropia stuff
supports various locking options, and you can read some of the reasoning
behind it in the docs at
http://new.extropia.com/development/webware2/webware2.html.  (See chapters
13 and 18.)

> > - why don't you use 'real' constants for $SUCCESS and the like? (use
> > constant)
>
> Two reasons, mostly historical, and not necessarily good ones.
>
> One, I benchmarked some code once that required high performance, and
> the use of constants was just slightly slower.

Ick.

> Two, I like the syntax $hash{$CONSTANT}.  If I remember correctly,
> $hash{CONSTANT} didn't work.  This may have changed in newer versions
> of Perl.

No, the use of constants as hash keys or in interpolated strings still
doesn't work.  I tried the constants module in my last project, and I found
it to be more trouble than it was worth.  It's annoying to have to write
things like $hash{&CONSTANT} or "string @{[&CONSTANT]}".

> Do you know if Storeable is definitely faster?

It is, and it's now part of the standard distribution.
http://www.astray.com/pipermail/foo/2000-August/000169.html

===

To: DeWitt Clinton <dewitt@unto.net>
From: Greg Cope <greg@rubberplant.freeserve.co.uk>
Subject: Re: [OT] Re: mod_perl shared memory with MM
Date: Sun, 11 Mar 2001 20:15:54 +0000

DeWitt Clinton wrote:

> Do you know if Storeable is definitely faster?  If you have benchmarks
> then I am more than happy to switch now.  Or, do you know of a reason,
> feature wise, that I should switch?  Again, it is trivial to do so.

I've found it to be arround 5 - 10 % faster - on simple stuff on some
benchmarking I did arround a year ago.

Can I ask why you are not useing IPC::Sharedlight (as its pure C and
apparently much faster than IPC::Shareable - I've never benchmarked it
as I've also used IPC::Sharedlight).

===

To: "Greg Cope" <greg@rubberplant.freeserve.co.uk>,
From: "Perrin Harkins" <perrin@primenet.com>
Subject: Re: [OT] Re: mod_perl shared memory with MM
Date: Sun, 11 Mar 2001 12:33:04 -0800

> Can I ask why you are not useing IPC::Sharedlight (as its pure C and
> apparently much faster than IPC::Shareable - I've never benchmarked it
> as I've also used IPC::Sharedlight).

Full circle back to the original topic...
IPC::MM is implemented in C and offers an actual hash interface backed by a
BTree in shared memory.  IPC::ShareLite only works for individual scalars.

It wouldn't surprise me if a file system approach was faster than either of
these on Linux, because of the agressive caching.

===
To: Perrin Harkins <perrin@primenet.com>
From: Greg Cope <greg@rubberplant.freeserve.co.uk>
Subject: Re: [OT] Re: mod_perl shared memory with MM
Date: Sun, 11 Mar 2001 20:42:17 +0000

Perrin Harkins wrote:
> 
> > Can I ask why you are not useing IPC::Sharedlight (as its pure C and
> > apparently much faster than IPC::Shareable - I've never benchmarked it
> > as I've also used IPC::Sharedlight).
> 
> Full circle back to the original topic...
> IPC::MM is implemented in C and offers an actual hash interface backed by a
> BTree in shared memory.  IPC::ShareLite only works for individual scalars.
> 

Not tried that one !

I'ce used the obvious Sharedlight plus Storable to serialise hashes.

> It wouldn't surprise me if a file system approach was faster than either of
> these on Linux, because of the agressive caching.

I would be an interesting benchmark ... Althought it may only be a
performance win on a lightly loaded machine,the assymption being that
the stat'ing is fast on a lowly loaded system with fast understressed
disks.  I could be completly wrong here tho ;-).

Has anyone used the file system approach on a RAM disk ?

===

To: Adi Fairbank <adi@certsite.com>
From: Sean Chittenden <sean@chittenden.org>
Subject: Re: mod_perl shared memory with MM
Date: Mon, 12 Mar 2001 12:14:26 -0800

 --zhtSGe8h3+lMyY1M
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

	Sorry for taking a while to get back to this, road trips
can be good at interrupting the flow of life.

	It depends on the application.  I typically use a few
instances of open() for the sake of simplicity, but I have also had
decent luck with IPC::Open(2|3).  The only problems I've had with
either was an OS specific bug with Linux (the pipe was newline
buffering and dropping all characters over 1023, moved to FreeBSD and
the problem went away).

	Words of wisdom: start slow because debugging over a pipe can
be a headache (understatement).  Simple additions + simple debugging =3D
good thing(tm).  I've spent too many afternoons/nights ripping apart
these kinds of programs only to find a small type-o and then
reconstructing a much larger query/response set of programs.  -sc

	PS You also want to attach the program listening to the named
pipe to something like DJB's daemon tools
(http://cr.yp.to/daemontools.html) to prevent new requests from
blocking if the listener dies: bad thing(tm).

===

To: Adi Fairbank <adi@certsite.com>, modperl@apache.org
From: barries <barries@slaysys.com>
Subject: Re: mod_perl shared memory with MM
Date: Mon, 12 Mar 2001 21:49:12 -0500

On Mon, Mar 12, 2001 at 12:14:26PM -0800, Sean Chittenden wrote:
> 	Sorry for taking a while to get back to this, road trips
> can be good at interrupting the flow of life.
> 
> 	It depends on the application.  I typically use a few
> instances of open() for the sake of simplicity, but I have also had
> decent luck with IPC::Open(2|3).  The only problems I've had with
> either was an OS specific bug with Linux (the pipe was newline
> buffering and dropping all characters over 1023, moved to FreeBSD and
> the problem went away).
> 
> 	Words of wisdom: start slow because debugging over a pipe can
> be a headache (understatement).  Simple additions + simple debugging =
> good thing(tm).  I've spent too many afternoons/nights ripping apart
> these kinds of programs only to find a small type-o and then
> reconstructing a much larger query/response set of programs.  -sc

<plug>

If you're working with Open{2,3}, you might want to take a gander at
IPC::Run, it's like Open{2,3,...}+select loop+expect, but with a
shell-like API and the ability to trace events in both the parent and
child processes (via a separate debugging pipe).  Just turn debugging on
and you can see what's sent and received on each pipe.  It avoids the
deadlocks warned about in the perlipc manpages (as you mention), which
can be quite tricky in the face of different failure modes.  IPC::Run
establishes or manages pipes, PTYs, plain ol' filehandles and timers in
such a wayavoid deadlocks (though if you try hard engugh...).

It's a bit bloated to do all that, but the debugging feature can make it
worth the bloat in many cases, and you can certainly rename it and carve
off great hunks of unneeded features (like param parsing or say ptys)
pretty easily if you want to whittle it down after debugging.

That being said, open2 and open3 are for forking around, not for
communicating over named pipes to server processes.
</plug>

> 	PS You also want to attach the program listening to the named
> pipe to something like DJB's daemon tools
> (http://cr.yp.to/daemontools.html) to prevent new requests from
> blocking if the listener dies: bad thing(tm).

Neat, thanks for that link.

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu