This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
Date: Mon, 19 Mar 2001 11:43:05 -0800 From: Joshua Chamas <joshua@chamas.com> To: Mod Perl <modperl@apache.org> Subject: [ANNOUNCE] MLDBM::Sync v.07 Hey, The latest MLDBM::Sync v.07 is in your local CPAN and also http://www.perl.com/CPAN-local/modules/by-module/MLDBM/ It provides a wrapper around MLDBM databases, like SDBM_File and DB_File, providing safe concurrent access, using a flock() strategy and per access dbm i/o flushing. A recent API addition allows for a secondary cache layer with Tie::Cache to be automatically used, like: my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640; $sync_dbm_obj->SyncCacheSize('100K'); On my dual PIII 450 linux box, I might get 1500 or so reads per sec to a SDBM_File based MLDBM::Sync database, and the Tie::Cache layer runs at about 15000 reads/sec, so for a high cache hit usage, the speedup can be considerable. MLDBM::Sync also comes with MLDBM::Sync::SDBM_File, a wrapper around SDBM_File that overcomes its 1024 byte limit for values, which can be fast for caching data up to 10000 bytes or so in length. -- Josh CHANGES $MODULE = "MLDBM::Sync"; $VERSION = .07; $DATE = 'TBA'; + $dbm->SyncCacheSize() API activates 2nd layer RAM cache via Tie::Cache with MaxBytes set. + CACHE documentation, cache.t test, sample benchmarks with ./bench/bench_sync.pl -c $MODULE = "MLDBM::Sync"; $VERSION = .05; $DATE = '2001/03/13'; + Simpler use of locking. - Read locking works on Solaris, had to open lock file in read/write mode. Linux/NT didn't care. NAME MLDBM::Sync (BETA) - safe concurrent access to MLDBM databases SYNOPSIS use MLDBM::Sync; # this gets the default, SDBM_File use MLDBM qw(DB_File Storable); # use Storable for serializing use MLDBM qw(MLDBM::Sync::SDBM_File); # use extended SDBM_File, handles values > 1024 bytes # NORMAL PROTECTED read/write with implicit locks per i/o request my $sync_dbm_obj = tie %cache, 'MLDBM::Sync' [..other DBM args..] or die $!; $cache{"AAAA"} = "BBBB"; my $value = $cache{"AAAA"}; # SERIALIZED PROTECTED read/write with explicit lock for both i/o requests my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640; $sync_dbm_obj->Lock; $cache{"AAAA"} = "BBBB"; my $value = $cache{"AAAA"}; $sync_dbm_obj->UnLock; # SERIALIZED PROTECTED READ access with explicit read lock for both reads $sync_dbm_obj->ReadLock; my @keys = keys %cache; my $value = $cache{'AAAA'}; $sync_dbm_obj->UnLock; # MEMORY CACHE LAYER with Tie::Cache $sync_dbm_obj->SyncCacheSize('100K'); # KEY CHECKSUMS, for lookups on MD5 checksums on large keys my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640; $sync_dbm_obj->SyncKeysChecksum(1); my $large_key = "KEY" x 10000; $sync{$large_key} = "LARGE"; my $value = $sync{$large_key}; DESCRIPTION This module wraps around the MLDBM interface, by handling concurrent access to MLDBM databases with file locking, and flushes i/o explicity per lock/unlock. The new [Read]Lock()/UnLock() API can be used to serialize requests logically and improve performance for bundled reads & writes. my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640; # Write locked critical section $sync_dbm_obj->Lock; ... all accesses to DBM LOCK_EX protected, and go to same tied file handles $cache{'KEY'} = 'VALUE'; $sync_dbm_obj->UnLock; # Read locked critical section $sync_dbm_obj->ReadLock; ... all read accesses to DBM LOCK_SH protected, and go to same tied files ... WARNING, cannot write to DBM in ReadLock() section, will die() my $value = $cache{'KEY'}; $sync_dbm_obj->UnLock; # Normal access OK too, without explicity locking $cache{'KEY'} = 'VALUE'; my $value = $cache{'KEY'}; MLDBM continues to serve as the underlying OO layer that serializes complex data structures to be stored in the databases. See the MLDBM the BUGS manpage section for important limitations. MLDBM::Sync also provides built in RAM caching with Tie::Cache md5 key checksum functionality. === Date: Mon, 19 Mar 2001 12:01:05 -0800 (PST) From: Perrin Harkins <perrin@primenet.com> To: Joshua Chamas <joshua@chamas.com> Subject: Re: [ANNOUNCE] MLDBM::Sync v.07 On Mon, 19 Mar 2001, Joshua Chamas wrote: > A recent API addition allows for a secondary cache layer with > Tie::Cache to be automatically used When one process writes a change to the dbm, will the others all see it, even if they use this? - Perrin === Date: Mon, 19 Mar 2001 12:25:08 -0800 From: Joshua Chamas <joshua@chamas.com> To: Perrin Harkins <perrin@primenet.com> Subject: Re: [ANNOUNCE] MLDBM::Sync v.07 Perrin Harkins wrote: > > On Mon, 19 Mar 2001, Joshua Chamas wrote: > > A recent API addition allows for a secondary cache layer with > > Tie::Cache to be automatically used > > When one process writes a change to the dbm, will the others all see it, > even if they use this? No, activation of the secondary cache layer will not see updates from other processes. This is best used for static data being cached. I can see a request coming down that "expires" this cached data. I'll build it when someone asks for it. -- Josh _________________________________________________________________ Joshua Chamas Chamas Enterprises Inc. NodeWorks >> free web link monitoring Huntington Beach, CA USA http://www.nodeworks.com 1-714-625-4051 === Date: Thu, 4 Apr 2002 09:48:16 +0000 (GMT) From: Franck PORCHER <fpo@esoft.pf> To: <modperl@perl.apache.org> Subject: Problem with DBM concurrent access Hi there, I have a quick and possibly trivial question that has bothered me for quite a while. I'm using a DBM as a repository. The DBM is constantly written to by only one process (the 'writer') that opens it RW. At the same time, many process (the 'reader') access it *read only*. I experience the fact that the 'readers' never get the last values written by the 'writer'. I suspect a problem of flush for the 'writer', and a problem of synchronisation for the 'reader'. So my question narrows down to : How to flush on disk the cache of a tied DBM (DB_File) structure in a way that any concurrent process accessing it in *read only* mode would automatically get the new values as soon as they are published (synchronisation) Thanks in advance. Franck ==================================================== Date: Thu, 04 Apr 2002 15:51:17 -0500 From: Perrin Harkins <perrin@elem.com> To: Franck PORCHER <fpo@esoft.pf> Subject: Re: Problem with DBM concurrent access Franck PORCHER wrote: > So my question narrows down to : > How to flush on disk the cache of a tied DBM (DB_File) structure > in a way that any concurrent process accessing it in *read only* mode > would automatically get the new values as soon as they > are published (synchronisation) You have to tie and untie on each request. There's some discussion of this in the Guide. As an alternative, you could look at using BerkeleyDB, or MLDBM::Sync (which does the tie/untie for you). - Perrin === From: "Rob Bloodgood" <robb@empire2.com> To: "Franck PORCHER" <fpo@esoft.pf> Cc: "mod_perl" <modperl@perl.apache.org> Subject: RE: Problem with DBM concurrent access Date: Thu, 4 Apr 2002 12:58:06 -0800 > So my question narrows down to : > How to flush on disk the cache of a tied DBM (DB_File) structure > in a way that any concurrent process accessing it in *read only* mode > would automatically get the new values as soon as they > are published (synchronisation) Isn't that just as simple as tied(%dbm_array)->sync(); ? HTH! L8r, Rob === Date: Fri, 05 Apr 2002 11:18:07 +0800 From: Stas Bekman <stas@stason.org> To: Rob Bloodgood <robb@empire2.com> Cc: Franck PORCHER <fpo@esoft.pf>, mod_perl <modperl@perl.apache.org> Subject: Re: Problem with DBM concurrent access Rob Bloodgood wrote: >>So my question narrows down to : >>How to flush on disk the cache of a tied DBM (DB_File) structure >>in a way that any concurrent process accessing it in *read only* mode >>would automatically get the new values as soon as they >>are published (synchronisation) >> > > Isn't that just as simple as > > tied(%dbm_array)->sync(); I believe that's not enough, because the reader may read data during the write, resulting in corrupted data read. You have to add locking. see the DBM chapter in the guide. -- _____________________________________________________________________ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:stas@stason.org http://ticketmaster.com http://apacheweek.com http://singlesheaven.com http://perl.apache.org http://perlmonth.com/ === Date: Thu, 04 Apr 2002 20:00:41 -0800 From: Joshua Chamas <joshua@chamas.com> To: Stas Bekman <stas@stason.org> Subject: Re: Problem with DBM concurrent access Stas Bekman wrote: > > > tied(%dbm_array)->sync(); > > I believe that's not enough, because the reader may read data during the > write, resulting in corrupted data read. You have to add locking. see > the DBM chapter in the guide. > You might add MLDBM::Sync to the docs, which easily adds locking to MLDBM. MLDBM is a front end to store complex data structures http://www.perl.com/CPAN-local/modules/by-module/MLDBM/CHAMAS/MLDBM-Sync-0.25.readme What's nice about MLDBM is you can easily swap in & out various dbms like SDBM_File, DB_File, GDBM_File, etc. More recently it even supports Tie::TextDir too, which provides key per file type storage which is good when you have a fast file system & big data you want to store. SYNOPSIS use MLDBM::Sync; # this gets the default, SDBM_File use MLDBM qw(DB_File Storable); # use Storable for serializing use MLDBM qw(MLDBM::Sync::SDBM_File); # use extended SDBM_File, handles values > 1024 bytes use Fcntl qw(:DEFAULT); # import symbols O_CREAT & O_RDWR for use with DBMs # NORMAL PROTECTED read/write with implicit locks per i/o request my $sync_dbm_obj = tie %cache, 'MLDBM::Sync' [..other DBM args..] or die $!; $cache{"AAAA"} = "BBBB"; my $value = $cache{"AAAA"}; ... DESCRIPTION This module wraps around the MLDBM interface, by handling concurrent access to MLDBM databases with file locking, and flushes i/o explicity per lock/unlock. The new [Read]Lock()/UnLock() API can be used to serialize requests logically and improve performance for bundled reads & writes. Here's some benchmarks on my 2.4.x linux box dual PIII 450 with a couple 7200 RPM IDE drives & raid-1 ext3 fs mounted default async. MLDBM-Sync-0.25]# perl bench/bench_sync.pl NUMBER OF PROCESSES IN TEST: 4 === INSERT OF 50 BYTE RECORDS === Time for 100 writes + 100 reads for SDBM_File 0.17 seconds 12288 bytes Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.20 seconds 12288 bytes Time for 100 writes + 100 reads for GDBM_File 1.06 seconds 18066 bytes Time for 100 writes + 100 reads for DB_File 0.63 seconds 12288 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 0.38 seconds 13192 bytes === INSERT OF 500 BYTE RECORDS === (skipping test for SDBM_File 100 byte limit) Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.58 seconds 261120 bytes Time for 100 writes + 100 reads for GDBM_File 1.09 seconds 63472 bytes Time for 100 writes + 100 reads for DB_File 0.64 seconds 98304 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 0.33 seconds 58192 bytes === INSERT OF 5000 BYTE RECORDS === (skipping test for SDBM_File 100 byte limit) Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 1.37 seconds 4128768 bytes Time for 100 writes + 100 reads for GDBM_File 1.13 seconds 832400 bytes Time for 100 writes + 100 reads for DB_File 1.08 seconds 831488 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 0.52 seconds 508192 bytes === INSERT OF 20000 BYTE RECORDS === (skipping test for SDBM_File 100 byte limit) (skipping test for MLDBM::Sync db size > 1M) Time for 100 writes + 100 reads for GDBM_File 1.76 seconds 2063912 bytes Time for 100 writes + 100 reads for DB_File 1.78 seconds 2060288 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 1.27 seconds 2008192 bytes === INSERT OF 50000 BYTE RECORDS === (skipping test for SDBM_File 100 byte limit) (skipping test for MLDBM::Sync db size > 1M) Time for 100 writes + 100 reads for GDBM_File 3.52 seconds 5337944 bytes Time for 100 writes + 100 reads for DB_File 3.37 seconds 5337088 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 2.80 seconds 5008192 bytes --Josh === From: "Perrin Harkins" <perrin@elem.com> To: "Stas Bekman" <stas@stason.org>, "Rob Bloodgood" <robb@empire2.com> Cc: "Franck PORCHER" <fpo@esoft.pf>, "mod_perl" <modperl@perl.apache.org> Subject: Re: Problem with DBM concurrent access Date: Thu, 4 Apr 2002 23:02:29 -0500 > > Isn't that just as simple as > > > > tied(%dbm_array)->sync(); > I believe that's not enough, because the reader may read > data during the write, resulting in corrupted data read. Not only that, there's also the issue with at least some dbm implementations that they cache part of the file in memory and will not pick up changed data unless you untie and re-tie. I remember a good discussion about this on the list a year or two back. === Date: Fri, 5 Apr 2002 12:47:59 -0500 To: modperl@perl.apache.org From: Dan Wilga <dwilga@MtHolyoke.edu> Subject: Re: Problem with DBM concurrent access I would also suggest using BerkeleyDB.pm, but with the DB_INIT_MPOOL|DB_INIT_CDB flags. In this mode, only one writer is allowed at a time, and Berkeley automatically handles all the locking and flushing. Just don't forget to use db_close() to close the file before untie'ing it. === Date: Fri, 5 Apr 2002 10:16:48 -0800 (PST) From: Andrew Ho <andrew@tellme.com> To: Dan Wilga <dwilga@MtHolyoke.edu> Cc: mod_perl List <modperl@perl.apache.org> Subject: Re: Problem with DBM concurrent access Hello, DW>I would also suggest using BerkeleyDB.pm, but with the DW>DB_INIT_MPOOL|DB_INIT_CDB flags. In this mode, only one writer is DW>allowed at a time, and Berkeley automatically handles all the locking DW>and flushing. Just don't forget to use db_close() to close the file DW>before untie'ing it. One caveat on this, BerkeleyDB maintains its locks and other environment information in a local memory segment so this won't work if multiple machines share the same BerkeleyDB file (e.g., you are using the BerkeleyDB file over NFS). ===