Simple Tied Hashes

Synopsis: the simple use of tie for data persistance

A quick example of the use of "tie" to get data to persist between runs of a script:

   #!/usr/bin/perl
   # what_did_i_say                   doom@kzsu.stanford.edu
   #                                  07 May 2004
   
   use warnings;
   use strict;
   $|=1;
   
   my %fried;
   
   use Fcntl;   # For O_RDWR, O_CREAT, etc.
   use NDBM_File;
   
   my $read_write_create =  O_RDWR;
      $read_write_create += O_CREAT;

   tie(%fried,
       'NDBM_File',
       '/home/doom/tmp/Testes/fried-hash.dbm',
       $read_write_create,
       0666) 
    or die "$0: tie failed: $!";
   
   print "Last time you said: $fried{egg} \n" if defined( $fried{egg} );
   $fried{egg} = $ARGV[0];
   print "But I'll remember you said: $fried{egg}\n";

   untie(%fried);

Why I wrote this web page

By some strange quirk, I'd never come across a need to use perl's "tie" (typically I've either read and written flat files directly, or gone all the way and used DBI). But as it happened, one day I was writing some code that needed simple persistance of a small amount of data between runs. I didn't need anything fancy, just a way to save a string, and to get it back the next time the script was run. I could've used a temp file without any trouble, but there was always the possibility that I was going to find reasons to expand the amount of persistant data, and rolling your own datafile formats is generally regarded as poor form, so I thought I would look up some of the standard ways of doing this. One of the first that occured to me was to just tie a hash.

I came across a funny syndrome: all of the obvious documentation on the subject tends to lead off by telling you how cool tie is because while it uses OOP, you can bury that stuff and keep it out of your face. Then they go on to intimately describe the 9 methods that need to be implemented in order to completely take control of your ties... And as it happens, I kind of know about that stuff already, but what I was really looking for is a way to avoid knowing it until I really need it.

It took a little messing around to work out how to do it, as presented in the code example above. I found there were a bunch of tiny little gotchas to getting it to work, and I can only imagine what a pain it would be for a beginning perl programmer. Someone or other *must* have written this up, but it's not really there in "Programming Perl" (2nd ed), "The Perl Cookbook" (2nd ed) [Ooops. See next paragraph] or "Effective Perl Programming" (1st ed), the online docs for "tie" in "perlfunc", or in "perltie", or in the perldoc for NDBM_File or Fcntl... hence this write-up. It's not a substitute for any of the aforementioned documentation, it's more of a stopgap: the code presented here is something you can start with, until you really need to learn what you're doing.

Um... what gave me the idea that the 2nd edition of The Perl Cookbook was missing what I was looking for? Anyway, Recipe 14.1 looks like just the thing: "Making and Using a DBM File". This uses the "DB_File" module rather than the "NDBM_File" I was using here, and connecting to it is a lot simpler. Hm. Will investigate, and maybe revise some stuff here.

the gotchas

For my own edification, here are the little hassles I had with following the docs:

  1. The perldoc for NDBM_File says about the "Flags" argument to tie:
    "O_RDWR" Both read and write access.
    If you want to create the file if it does not exist, add "O_CREAT" to any of these, as in the example.
    The "example" they refer to appears to be missing, so it took me a moment to realize that when they say say to "add" O_CREAT, they literally mean numeric addition. (Does that sound like a dumb thing to say? Duh, "add" means "addition", right? But what do you say when you add an item to a list? English is wonderful.)

    I needed to look up the numeric values for these constants in the appropriate fcntl.h file (on my box: /usr/i386-glibc21-linux/include/asm/fcntl.h) in order to get a clue:
       #define O_ACCMODE	   0003
       #define O_RDONLY	     00
       #define O_WRONLY	     01
       #define O_RDWR		     02
       #define O_CREAT		   0100	/* not fcntl */
    
    This was my first strong hint that I was supposed to think of this as flipping the bits of a binary number. The first four items in the list there just use the two low bits of the low byte, O_CREAT is the low bit of the high byte. You can add them together and still extract the information later.
  2. Even when you know that you should be doing numeric addition, there's another problem with adding these Fcntl constants. You can't just say: O_RDWR + O_CREAT (that yields 2, not 2 + 64 = 66, as you'd expect). You'll notice the workaround used in the code example above: one constant is assigned to a variable, then the other constant is added to the variable. It's not entirely clear to me why you need to do this, but if you look inside Fcntl.pm, you'll see it's doing AUTOLOADER magic to generate constant subs on the fly. It does not do "use constant", and I would guess the perl compiler parses things differently when given that hint.
  3. During the development process, some of my early runs created db files with permissions set to "none", so later runs were denied permssion to access the files for reasons that seemed mysterious at first.
  4. My reading of the docs lead me to expect that "ndbm" would use a *single* file, created in the name I supplied: /home/doom/tmp/Testes/fried-hash.dbm. Instead, it uses two files called: fried-hash.dbm.pag and fried-hash.dbm.dir. Because of this confusion, I kept doing things like ls -la /home/doom/tmp/Testes/fried-hash.dbm and turning up nothing, giving me the feeling I hadn't created the db at all.
  5. And I suspect that still another documentation gotcha is that the on-line docs steer you in the direction of using NDBM, where the Berkeley DB is probably used more often, and the SDBM is actually included with perl...

conclusion

And there you have it. Not huge problems by any means, but there were enough of them to make it take some time to get something running that I'd originally figured I could just look up in five minutes.

Looking over the above, it looks an awful lot like I've got some bugs to report (at least documentation bugs). I promise to work on this Real Soon. (Whining on web pages is *so* much easier.)


Joseph Brenner, 08 May 2004