This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
Date: Mon, 19 Mar 2001 11:43:05 -0800
From: Joshua Chamas <joshua@chamas.com>
To: Mod Perl <modperl@apache.org>
Subject: [ANNOUNCE] MLDBM::Sync v.07
Hey,
The latest MLDBM::Sync v.07 is in your local CPAN and also
http://www.perl.com/CPAN-local/modules/by-module/MLDBM/
It provides a wrapper around MLDBM databases, like SDBM_File
and DB_File, providing safe concurrent access, using a flock()
strategy and per access dbm i/o flushing.
A recent API addition allows for a secondary cache layer with
Tie::Cache to be automatically used, like:
my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640;
$sync_dbm_obj->SyncCacheSize('100K');
On my dual PIII 450 linux box, I might get 1500 or so reads per sec
to a SDBM_File based MLDBM::Sync database, and the Tie::Cache layer
runs at about 15000 reads/sec, so for a high cache hit usage, the
speedup can be considerable.
MLDBM::Sync also comes with MLDBM::Sync::SDBM_File, a wrapper around
SDBM_File that overcomes its 1024 byte limit for values, which
can be fast for caching data up to 10000 bytes or so in length.
-- Josh
CHANGES
$MODULE = "MLDBM::Sync"; $VERSION = .07; $DATE = 'TBA';
+ $dbm->SyncCacheSize() API activates 2nd layer RAM cache
via Tie::Cache with MaxBytes set.
+ CACHE documentation, cache.t test, sample benchmarks
with ./bench/bench_sync.pl -c
$MODULE = "MLDBM::Sync"; $VERSION = .05; $DATE = '2001/03/13';
+ Simpler use of locking.
- Read locking works on Solaris, had to open lock file in
read/write mode. Linux/NT didn't care.
NAME
MLDBM::Sync (BETA) - safe concurrent access to MLDBM databases
SYNOPSIS
use MLDBM::Sync; # this gets the default, SDBM_File
use MLDBM qw(DB_File Storable); # use Storable for serializing
use MLDBM qw(MLDBM::Sync::SDBM_File); # use extended SDBM_File, handles values > 1024 bytes
# NORMAL PROTECTED read/write with implicit locks per i/o request
my $sync_dbm_obj = tie %cache, 'MLDBM::Sync' [..other DBM args..] or die $!;
$cache{"AAAA"} = "BBBB";
my $value = $cache{"AAAA"};
# SERIALIZED PROTECTED read/write with explicit lock for both i/o requests
my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640;
$sync_dbm_obj->Lock;
$cache{"AAAA"} = "BBBB";
my $value = $cache{"AAAA"};
$sync_dbm_obj->UnLock;
# SERIALIZED PROTECTED READ access with explicit read lock for both reads
$sync_dbm_obj->ReadLock;
my @keys = keys %cache;
my $value = $cache{'AAAA'};
$sync_dbm_obj->UnLock;
# MEMORY CACHE LAYER with Tie::Cache
$sync_dbm_obj->SyncCacheSize('100K');
# KEY CHECKSUMS, for lookups on MD5 checksums on large keys
my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640;
$sync_dbm_obj->SyncKeysChecksum(1);
my $large_key = "KEY" x 10000;
$sync{$large_key} = "LARGE";
my $value = $sync{$large_key};
DESCRIPTION
This module wraps around the MLDBM interface, by handling concurrent
access to MLDBM databases with file locking, and flushes i/o explicity
per lock/unlock. The new [Read]Lock()/UnLock() API can be used to
serialize requests logically and improve performance for bundled reads &
writes.
my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640;
# Write locked critical section
$sync_dbm_obj->Lock;
... all accesses to DBM LOCK_EX protected, and go to same tied file handles
$cache{'KEY'} = 'VALUE';
$sync_dbm_obj->UnLock;
# Read locked critical section
$sync_dbm_obj->ReadLock;
... all read accesses to DBM LOCK_SH protected, and go to same tied files
... WARNING, cannot write to DBM in ReadLock() section, will die()
my $value = $cache{'KEY'};
$sync_dbm_obj->UnLock;
# Normal access OK too, without explicity locking
$cache{'KEY'} = 'VALUE';
my $value = $cache{'KEY'};
MLDBM continues to serve as the underlying OO layer that serializes
complex data structures to be stored in the databases. See the MLDBM the
BUGS manpage section for important limitations.
MLDBM::Sync also provides built in RAM caching with Tie::Cache md5 key
checksum functionality.
===
Date: Mon, 19 Mar 2001 12:01:05 -0800 (PST)
From: Perrin Harkins <perrin@primenet.com>
To: Joshua Chamas <joshua@chamas.com>
Subject: Re: [ANNOUNCE] MLDBM::Sync v.07
On Mon, 19 Mar 2001, Joshua Chamas wrote:
> A recent API addition allows for a secondary cache layer with
> Tie::Cache to be automatically used
When one process writes a change to the dbm, will the others all see it,
even if they use this?
- Perrin
===
Date: Mon, 19 Mar 2001 12:25:08 -0800
From: Joshua Chamas <joshua@chamas.com>
To: Perrin Harkins <perrin@primenet.com>
Subject: Re: [ANNOUNCE] MLDBM::Sync v.07
Perrin Harkins wrote:
>
> On Mon, 19 Mar 2001, Joshua Chamas wrote:
> > A recent API addition allows for a secondary cache layer with
> > Tie::Cache to be automatically used
>
> When one process writes a change to the dbm, will the others all see it,
> even if they use this?
No, activation of the secondary cache layer will not see
updates from other processes. This is best used for static
data being cached.
I can see a request coming down that "expires" this
cached data. I'll build it when someone asks for it.
-- Josh
_________________________________________________________________
Joshua Chamas Chamas Enterprises Inc.
NodeWorks >> free web link monitoring Huntington Beach, CA USA
http://www.nodeworks.com 1-714-625-4051
===
Date: Thu, 4 Apr 2002 09:48:16 +0000 (GMT)
From: Franck PORCHER <fpo@esoft.pf>
To: <modperl@perl.apache.org>
Subject: Problem with DBM concurrent access
Hi there,
I have a quick and possibly trivial question that has bothered me
for quite a while.
I'm using a DBM as a repository. The DBM is constantly written to by only one
process (the 'writer') that opens it RW. At the same time, many process (the 'reader')
access it *read only*.
I experience the fact that the 'readers' never get the last values
written by the 'writer'. I suspect a problem of flush for the 'writer',
and a problem of synchronisation for the 'reader'.
So my question narrows down to :
How to flush on disk the cache of a tied DBM (DB_File) structure
in a way that any concurrent process accessing it in *read only* mode
would automatically get the new values as soon as they
are published (synchronisation)
Thanks in advance.
Franck
====================================================
Date: Thu, 04 Apr 2002 15:51:17 -0500
From: Perrin Harkins <perrin@elem.com>
To: Franck PORCHER <fpo@esoft.pf>
Subject: Re: Problem with DBM concurrent access
Franck PORCHER wrote:
> So my question narrows down to :
> How to flush on disk the cache of a tied DBM (DB_File) structure
> in a way that any concurrent process accessing it in *read only* mode
> would automatically get the new values as soon as they
> are published (synchronisation)
You have to tie and untie on each request. There's some discussion of
this in the Guide. As an alternative, you could look at using
BerkeleyDB, or MLDBM::Sync (which does the tie/untie for you).
- Perrin
===
From: "Rob Bloodgood" <robb@empire2.com>
To: "Franck PORCHER" <fpo@esoft.pf>
Cc: "mod_perl" <modperl@perl.apache.org>
Subject: RE: Problem with DBM concurrent access
Date: Thu, 4 Apr 2002 12:58:06 -0800
> So my question narrows down to :
> How to flush on disk the cache of a tied DBM (DB_File) structure
> in a way that any concurrent process accessing it in *read only* mode
> would automatically get the new values as soon as they
> are published (synchronisation)
Isn't that just as simple as
tied(%dbm_array)->sync();
?
HTH!
L8r,
Rob
===
Date: Fri, 05 Apr 2002 11:18:07 +0800
From: Stas Bekman <stas@stason.org>
To: Rob Bloodgood <robb@empire2.com>
Cc: Franck PORCHER <fpo@esoft.pf>, mod_perl <modperl@perl.apache.org>
Subject: Re: Problem with DBM concurrent access
Rob Bloodgood wrote:
>>So my question narrows down to :
>>How to flush on disk the cache of a tied DBM (DB_File) structure
>>in a way that any concurrent process accessing it in *read only* mode
>>would automatically get the new values as soon as they
>>are published (synchronisation)
>>
>
> Isn't that just as simple as
>
> tied(%dbm_array)->sync();
I believe that's not enough, because the reader may read data during the
write, resulting in corrupted data read. You have to add locking. see
the DBM chapter in the guide.
--
_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas@stason.org http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
===
Date: Thu, 04 Apr 2002 20:00:41 -0800
From: Joshua Chamas <joshua@chamas.com>
To: Stas Bekman <stas@stason.org>
Subject: Re: Problem with DBM concurrent access
Stas Bekman wrote:
>
> > tied(%dbm_array)->sync();
>
> I believe that's not enough, because the reader may read data during the
> write, resulting in corrupted data read. You have to add locking. see
> the DBM chapter in the guide.
>
You might add MLDBM::Sync to the docs, which easily adds locking
to MLDBM. MLDBM is a front end to store complex data structures
http://www.perl.com/CPAN-local/modules/by-module/MLDBM/CHAMAS/MLDBM-Sync-0.25.readme
What's nice about MLDBM is you can easily swap in & out various dbms
like SDBM_File, DB_File, GDBM_File, etc. More recently it even
supports Tie::TextDir too, which provides key per file type storage
which is good when you have a fast file system & big data you want
to store.
SYNOPSIS
use MLDBM::Sync; # this gets the default, SDBM_File
use MLDBM qw(DB_File Storable); # use Storable for serializing
use MLDBM qw(MLDBM::Sync::SDBM_File); # use extended SDBM_File, handles values > 1024 bytes
use Fcntl qw(:DEFAULT); # import symbols O_CREAT & O_RDWR for use with DBMs
# NORMAL PROTECTED read/write with implicit locks per i/o request
my $sync_dbm_obj = tie %cache, 'MLDBM::Sync' [..other DBM args..] or die $!;
$cache{"AAAA"} = "BBBB";
my $value = $cache{"AAAA"};
...
DESCRIPTION
This module wraps around the MLDBM interface, by handling concurrent
access to MLDBM databases with file locking, and flushes i/o explicity
per lock/unlock. The new [Read]Lock()/UnLock() API can be used to
serialize requests logically and improve performance for bundled reads &
writes.
Here's some benchmarks on my 2.4.x linux box dual PIII 450 with a couple
7200 RPM IDE drives & raid-1 ext3 fs mounted default async.
MLDBM-Sync-0.25]# perl bench/bench_sync.pl
NUMBER OF PROCESSES IN TEST: 4
=== INSERT OF 50 BYTE RECORDS ===
Time for 100 writes + 100 reads for SDBM_File 0.17 seconds 12288 bytes
Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.20 seconds 12288 bytes
Time for 100 writes + 100 reads for GDBM_File 1.06 seconds 18066 bytes
Time for 100 writes + 100 reads for DB_File 0.63 seconds 12288 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 0.38 seconds 13192 bytes
=== INSERT OF 500 BYTE RECORDS ===
(skipping test for SDBM_File 100 byte limit)
Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.58 seconds 261120 bytes
Time for 100 writes + 100 reads for GDBM_File 1.09 seconds 63472 bytes
Time for 100 writes + 100 reads for DB_File 0.64 seconds 98304 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 0.33 seconds 58192 bytes
=== INSERT OF 5000 BYTE RECORDS ===
(skipping test for SDBM_File 100 byte limit)
Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 1.37 seconds 4128768 bytes
Time for 100 writes + 100 reads for GDBM_File 1.13 seconds 832400 bytes
Time for 100 writes + 100 reads for DB_File 1.08 seconds 831488 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 0.52 seconds 508192 bytes
=== INSERT OF 20000 BYTE RECORDS ===
(skipping test for SDBM_File 100 byte limit)
(skipping test for MLDBM::Sync db size > 1M)
Time for 100 writes + 100 reads for GDBM_File 1.76 seconds 2063912 bytes
Time for 100 writes + 100 reads for DB_File 1.78 seconds 2060288 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 1.27 seconds 2008192 bytes
=== INSERT OF 50000 BYTE RECORDS ===
(skipping test for SDBM_File 100 byte limit)
(skipping test for MLDBM::Sync db size > 1M)
Time for 100 writes + 100 reads for GDBM_File 3.52 seconds 5337944 bytes
Time for 100 writes + 100 reads for DB_File 3.37 seconds 5337088 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 2.80 seconds 5008192 bytes
--Josh
===
From: "Perrin Harkins" <perrin@elem.com>
To: "Stas Bekman" <stas@stason.org>, "Rob Bloodgood" <robb@empire2.com>
Cc: "Franck PORCHER" <fpo@esoft.pf>, "mod_perl" <modperl@perl.apache.org>
Subject: Re: Problem with DBM concurrent access
Date: Thu, 4 Apr 2002 23:02:29 -0500
> > Isn't that just as simple as
> >
> > tied(%dbm_array)->sync();
> I believe that's not enough, because the reader may read
> data during the write, resulting in corrupted data read.
Not only that, there's also the issue with at least some dbm
implementations that they cache part of the file in memory and will not
pick up changed data unless you untie and re-tie. I remember a good
discussion about this on the list a year or two back.
===
Date: Fri, 5 Apr 2002 12:47:59 -0500
To: modperl@perl.apache.org
From: Dan Wilga <dwilga@MtHolyoke.edu>
Subject: Re: Problem with DBM concurrent access
I would also suggest using BerkeleyDB.pm, but with the
DB_INIT_MPOOL|DB_INIT_CDB flags. In this mode, only one writer is
allowed at a time, and Berkeley automatically handles all the locking
and flushing. Just don't forget to use db_close() to close the file
before untie'ing it.
===
Date: Fri, 5 Apr 2002 10:16:48 -0800 (PST)
From: Andrew Ho <andrew@tellme.com>
To: Dan Wilga <dwilga@MtHolyoke.edu>
Cc: mod_perl List <modperl@perl.apache.org>
Subject: Re: Problem with DBM concurrent access
Hello,
DW>I would also suggest using BerkeleyDB.pm, but with the
DW>DB_INIT_MPOOL|DB_INIT_CDB flags. In this mode, only one writer is
DW>allowed at a time, and Berkeley automatically handles all the locking
DW>and flushing. Just don't forget to use db_close() to close the file
DW>before untie'ing it.
One caveat on this, BerkeleyDB maintains its locks and other environment
information in a local memory segment so this won't work if multiple
machines share the same BerkeleyDB file (e.g., you are using the
BerkeleyDB file over NFS).
===