modperl-DBMs_with_mod_perl

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



Date: Mon, 19 Mar 2001 11:43:05 -0800
From: Joshua Chamas <joshua@chamas.com>
To: Mod Perl <modperl@apache.org>
Subject: [ANNOUNCE] MLDBM::Sync v.07

Hey,

The latest MLDBM::Sync v.07 is in your local CPAN and also
  http://www.perl.com/CPAN-local/modules/by-module/MLDBM/

It provides a wrapper around MLDBM databases, like SDBM_File
and DB_File, providing safe concurrent access, using a flock()
strategy and per access dbm i/o flushing.  

A recent API addition allows for a secondary cache layer with
Tie::Cache to be automatically used, like:

  my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640;
  $sync_dbm_obj->SyncCacheSize('100K');

On my dual PIII 450 linux box, I might get 1500 or so reads per sec 
to a SDBM_File based MLDBM::Sync database, and the Tie::Cache layer
runs at about 15000 reads/sec, so for a high cache hit usage, the 
speedup can be considerable.

MLDBM::Sync also comes with MLDBM::Sync::SDBM_File, a wrapper around 
SDBM_File that overcomes its 1024 byte limit for values, which 
can be fast for caching data up to 10000 bytes or so in length.

-- Josh

CHANGES

$MODULE = "MLDBM::Sync"; $VERSION = .07; $DATE = 'TBA';

+ $dbm->SyncCacheSize() API activates 2nd layer RAM cache
  via Tie::Cache with MaxBytes set.

+ CACHE documentation, cache.t test, sample benchmarks
  with ./bench/bench_sync.pl -c

$MODULE = "MLDBM::Sync"; $VERSION = .05; $DATE = '2001/03/13';

+ Simpler use of locking.

- Read locking works on Solaris, had to open lock file in
  read/write mode.  Linux/NT didn't care.

NAME
      MLDBM::Sync (BETA) - safe concurrent access to MLDBM databases

SYNOPSIS
      use MLDBM::Sync;                       # this gets the default, SDBM_File
      use MLDBM qw(DB_File Storable);        # use Storable for serializing
      use MLDBM qw(MLDBM::Sync::SDBM_File);  # use extended SDBM_File, handles values > 1024 bytes

      # NORMAL PROTECTED read/write with implicit locks per i/o request
      my $sync_dbm_obj = tie %cache, 'MLDBM::Sync' [..other DBM args..] or die $!;
      $cache{"AAAA"} = "BBBB";
      my $value = $cache{"AAAA"};

      # SERIALIZED PROTECTED read/write with explicit lock for both i/o requests
      my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640;
      $sync_dbm_obj->Lock;
      $cache{"AAAA"} = "BBBB";
      my $value = $cache{"AAAA"};
      $sync_dbm_obj->UnLock;

      # SERIALIZED PROTECTED READ access with explicit read lock for both reads
      $sync_dbm_obj->ReadLock;
      my @keys = keys %cache;
      my $value = $cache{'AAAA'};
      $sync_dbm_obj->UnLock;

      # MEMORY CACHE LAYER with Tie::Cache
      $sync_dbm_obj->SyncCacheSize('100K');

      # KEY CHECKSUMS, for lookups on MD5 checksums on large keys
      my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640;
      $sync_dbm_obj->SyncKeysChecksum(1);
      my $large_key = "KEY" x 10000;
      $sync{$large_key} = "LARGE";
      my $value = $sync{$large_key};

DESCRIPTION
    This module wraps around the MLDBM interface, by handling concurrent
    access to MLDBM databases with file locking, and flushes i/o explicity
    per lock/unlock. The new [Read]Lock()/UnLock() API can be used to
    serialize requests logically and improve performance for bundled reads &
    writes.

      my $sync_dbm_obj = tie %cache, 'MLDBM::Sync', '/tmp/syncdbm', O_CREAT|O_RDWR, 0640;

      # Write locked critical section
      $sync_dbm_obj->Lock;
        ... all accesses to DBM LOCK_EX protected, and go to same tied file handles
        $cache{'KEY'} = 'VALUE';
      $sync_dbm_obj->UnLock;

      # Read locked critical section
      $sync_dbm_obj->ReadLock;
        ... all read accesses to DBM LOCK_SH protected, and go to same tied files
        ... WARNING, cannot write to DBM in ReadLock() section, will die()
        my $value = $cache{'KEY'};
      $sync_dbm_obj->UnLock;

      # Normal access OK too, without explicity locking
      $cache{'KEY'} = 'VALUE';
      my $value = $cache{'KEY'};

    MLDBM continues to serve as the underlying OO layer that serializes
    complex data structures to be stored in the databases. See the MLDBM the
    BUGS manpage section for important limitations.

    MLDBM::Sync also provides built in RAM caching with Tie::Cache md5 key
    checksum functionality.


===

Date: Mon, 19 Mar 2001 12:01:05 -0800 (PST)
From: Perrin Harkins <perrin@primenet.com>
To: Joshua Chamas <joshua@chamas.com>
Subject: Re: [ANNOUNCE] MLDBM::Sync v.07

On Mon, 19 Mar 2001, Joshua Chamas wrote:
> A recent API addition allows for a secondary cache layer with
> Tie::Cache to be automatically used

When one process writes a change to the dbm, will the others all see it,
even if they use this?
- Perrin



===

Date: Mon, 19 Mar 2001 12:25:08 -0800
From: Joshua Chamas <joshua@chamas.com>
To: Perrin Harkins <perrin@primenet.com>
Subject: Re: [ANNOUNCE] MLDBM::Sync v.07

Perrin Harkins wrote:
> 
> On Mon, 19 Mar 2001, Joshua Chamas wrote:
> > A recent API addition allows for a secondary cache layer with
> > Tie::Cache to be automatically used
> 
> When one process writes a change to the dbm, will the others all see it,
> even if they use this?

No, activation of the secondary cache layer will not see
updates from other processes.  This is best used for static 
data being cached.

I can see a request coming down that "expires" this 
cached data.  I'll build it when someone asks for it.

-- Josh

_________________________________________________________________
Joshua Chamas			        Chamas Enterprises Inc.
NodeWorks >> free web link monitoring	Huntington Beach, CA  USA 
http://www.nodeworks.com                1-714-625-4051


===

Date: Thu, 4 Apr 2002 09:48:16 +0000 (GMT)
From: Franck PORCHER <fpo@esoft.pf>
To: <modperl@perl.apache.org>
Subject: Problem with DBM concurrent access

Hi there,

I have a quick and possibly trivial question that has bothered me
for quite a while.

I'm using a DBM as a repository. The DBM is constantly written to by only one
process (the 'writer') that opens it RW. At the same time, many process (the 'reader')
access it *read only*.

I experience the fact that the 'readers' never get the last values
written by the 'writer'. I suspect a problem of flush for the 'writer',
and a problem of synchronisation for the 'reader'.

So my question narrows down to :
How to flush on disk the cache of a tied DBM (DB_File) structure
in a way that any concurrent process accessing it in *read only* mode
would automatically get the new values as soon as they
are published (synchronisation)

Thanks in advance.

Franck

====================================================



Date: Thu, 04 Apr 2002 15:51:17 -0500
From: Perrin Harkins <perrin@elem.com>
To: Franck PORCHER <fpo@esoft.pf>
Subject: Re: Problem with DBM concurrent access

Franck PORCHER wrote:
> So my question narrows down to :
> How to flush on disk the cache of a tied DBM (DB_File) structure
> in a way that any concurrent process accessing it in *read only* mode
> would automatically get the new values as soon as they
> are published (synchronisation)

You have to tie and untie on each request.  There's some discussion of 
this in the Guide.  As an alternative, you could look at using 
BerkeleyDB, or MLDBM::Sync (which does the tie/untie for you).

- Perrin



===

From: "Rob Bloodgood" <robb@empire2.com>
To: "Franck PORCHER" <fpo@esoft.pf>
Cc: "mod_perl" <modperl@perl.apache.org>
Subject: RE: Problem with DBM concurrent access
Date: Thu, 4 Apr 2002 12:58:06 -0800

> So my question narrows down to :
> How to flush on disk the cache of a tied DBM (DB_File) structure
> in a way that any concurrent process accessing it in *read only* mode
> would automatically get the new values as soon as they
> are published (synchronisation)

Isn't that just as simple as

tied(%dbm_array)->sync();

?

HTH!

L8r,
Rob


===

Date: Fri, 05 Apr 2002 11:18:07 +0800
From: Stas Bekman <stas@stason.org>
To: Rob Bloodgood <robb@empire2.com>
Cc: Franck PORCHER <fpo@esoft.pf>, mod_perl <modperl@perl.apache.org>
Subject: Re: Problem with DBM concurrent access

Rob Bloodgood wrote:
>>So my question narrows down to :
>>How to flush on disk the cache of a tied DBM (DB_File) structure
>>in a way that any concurrent process accessing it in *read only* mode
>>would automatically get the new values as soon as they
>>are published (synchronisation)
>>
> 
> Isn't that just as simple as
> 
> tied(%dbm_array)->sync();

I believe that's not enough, because the reader may read data during the 
write, resulting in corrupted data read. You have to add locking. see 
the DBM chapter in the guide.

-- 


_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:stas@stason.org  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




===

Date: Thu, 04 Apr 2002 20:00:41 -0800
From: Joshua Chamas <joshua@chamas.com>
To: Stas Bekman <stas@stason.org>
Subject: Re: Problem with DBM concurrent access

Stas Bekman wrote:
> 
> > tied(%dbm_array)->sync();
> 
> I believe that's not enough, because the reader may read data during the
> write, resulting in corrupted data read. You have to add locking. see
> the DBM chapter in the guide.
> 

You might add MLDBM::Sync to the docs, which easily adds locking
to MLDBM.  MLDBM is a front end to store complex data structures

  http://www.perl.com/CPAN-local/modules/by-module/MLDBM/CHAMAS/MLDBM-Sync-0.25.readme

What's nice about MLDBM is you can easily swap in & out various dbms
like SDBM_File, DB_File, GDBM_File, etc.  More recently it even
supports Tie::TextDir too, which provides key per file type storage
which is good when you have a fast file system & big data you want
to store.

SYNOPSIS
      use MLDBM::Sync;                       # this gets the default, SDBM_File
      use MLDBM qw(DB_File Storable);        # use Storable for serializing
      use MLDBM qw(MLDBM::Sync::SDBM_File);  # use extended SDBM_File, handles values > 1024 bytes
      use Fcntl qw(:DEFAULT);                # import symbols O_CREAT & O_RDWR for use with DBMs

      # NORMAL PROTECTED read/write with implicit locks per i/o request
      my $sync_dbm_obj = tie %cache, 'MLDBM::Sync' [..other DBM args..] or die $!;
      $cache{"AAAA"} = "BBBB";
      my $value = $cache{"AAAA"};

...

DESCRIPTION
    This module wraps around the MLDBM interface, by handling concurrent
    access to MLDBM databases with file locking, and flushes i/o explicity
    per lock/unlock. The new [Read]Lock()/UnLock() API can be used to
    serialize requests logically and improve performance for bundled reads &
    writes.

Here's some benchmarks on my 2.4.x linux box dual PIII 450 with a couple
7200 RPM IDE drives & raid-1 ext3 fs mounted default async.

MLDBM-Sync-0.25]# perl bench/bench_sync.pl 

NUMBER OF PROCESSES IN TEST: 4

=== INSERT OF 50 BYTE RECORDS ===
  Time for 100 writes + 100 reads for  SDBM_File                  0.17 seconds     12288 bytes 
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File     0.20 seconds     12288 bytes 
  Time for 100 writes + 100 reads for  GDBM_File                  1.06 seconds     18066 bytes 
  Time for 100 writes + 100 reads for  DB_File                    0.63 seconds     12288 bytes 
  Time for 100 writes + 100 reads for  Tie::TextDir .04           0.38 seconds     13192 bytes 

=== INSERT OF 500 BYTE RECORDS ===
 (skipping test for SDBM_File 100 byte limit)
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File     0.58 seconds    261120 bytes 
  Time for 100 writes + 100 reads for  GDBM_File                  1.09 seconds     63472 bytes 
  Time for 100 writes + 100 reads for  DB_File                    0.64 seconds     98304 bytes 
  Time for 100 writes + 100 reads for  Tie::TextDir .04           0.33 seconds     58192 bytes 

=== INSERT OF 5000 BYTE RECORDS ===
 (skipping test for SDBM_File 100 byte limit)
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File     1.37 seconds   4128768 bytes 
  Time for 100 writes + 100 reads for  GDBM_File                  1.13 seconds    832400 bytes 
  Time for 100 writes + 100 reads for  DB_File                    1.08 seconds    831488 bytes 
  Time for 100 writes + 100 reads for  Tie::TextDir .04           0.52 seconds    508192 bytes 

=== INSERT OF 20000 BYTE RECORDS ===
 (skipping test for SDBM_File 100 byte limit)
 (skipping test for MLDBM::Sync db size > 1M)
  Time for 100 writes + 100 reads for  GDBM_File                  1.76 seconds   2063912 bytes 
  Time for 100 writes + 100 reads for  DB_File                    1.78 seconds   2060288 bytes 
  Time for 100 writes + 100 reads for  Tie::TextDir .04           1.27 seconds   2008192 bytes 

=== INSERT OF 50000 BYTE RECORDS ===
 (skipping test for SDBM_File 100 byte limit)
 (skipping test for MLDBM::Sync db size > 1M)
  Time for 100 writes + 100 reads for  GDBM_File                  3.52 seconds   5337944 bytes 
  Time for 100 writes + 100 reads for  DB_File                    3.37 seconds   5337088 bytes 
  Time for 100 writes + 100 reads for  Tie::TextDir .04           2.80 seconds   5008192 bytes 

--Josh



===

From: "Perrin Harkins" <perrin@elem.com>
To: "Stas Bekman" <stas@stason.org>, "Rob Bloodgood" <robb@empire2.com>
Cc: "Franck PORCHER" <fpo@esoft.pf>, "mod_perl" <modperl@perl.apache.org>
Subject: Re: Problem with DBM concurrent access
Date: Thu, 4 Apr 2002 23:02:29 -0500

> > Isn't that just as simple as
> >
> > tied(%dbm_array)->sync();

> I believe that's not enough, because the reader may read
> data during the write, resulting in corrupted data read.

Not only that, there's also the issue with at least some dbm
implementations that they cache part of the file in memory and will not
pick up changed data unless you untie and re-tie.  I remember a good
discussion about this on the list a year or two back.

===

Date: Fri, 5 Apr 2002 12:47:59 -0500
To: modperl@perl.apache.org
From: Dan Wilga <dwilga@MtHolyoke.edu>
Subject: Re: Problem with DBM concurrent access

I would also suggest using BerkeleyDB.pm, but with the 
DB_INIT_MPOOL|DB_INIT_CDB flags. In this mode, only one writer is 
allowed at a time, and Berkeley automatically handles all the locking 
and flushing. Just don't forget to use db_close() to close the file 
before untie'ing it.

===

Date: Fri, 5 Apr 2002 10:16:48 -0800 (PST)
From: Andrew Ho <andrew@tellme.com>
To: Dan Wilga <dwilga@MtHolyoke.edu>
Cc: mod_perl List <modperl@perl.apache.org>
Subject: Re: Problem with DBM concurrent access

Hello,

DW>I would also suggest using BerkeleyDB.pm, but with the 
DW>DB_INIT_MPOOL|DB_INIT_CDB flags. In this mode, only one writer is 
DW>allowed at a time, and Berkeley automatically handles all the locking 
DW>and flushing. Just don't forget to use db_close() to close the file 
DW>before untie'ing it.

One caveat on this, BerkeleyDB maintains its locks and other environment
information in a local memory segment so this won't work if multiple
machines share the same BerkeleyDB file (e.g., you are using the
BerkeleyDB file over NFS).

===

the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu