modperl_psuedohashes

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

To: "modperl@apache.org" <modperl@apache.org>
From: Tom_Roche@ncsu.edu
Subject: pseudo-hashes? was: Data structure question
Date: Mon, 22 Jan 2001 18:05:46 -0500

Until reading Conway's "Object Oriented Perl"

http://www.manning.com/Conway/

(section 4.3, pp 126-135) I hadn't heard about pseudo-hashes. I now
desire a data structure with non-numeric keys, definable iteration
order, no autovivification, and happy syntax. (And, of course,
fast-n-small :-) Having Conway's blessing is nice, and perldelta for
5.6 says "Pseudo-hashes work better" (with details). But it also says

http://perldoc.com/perl5.6/pod/perldelta.html
> NOTE: The pseudo-hash data type continues to be experimental.
>       Limiting oneself to the interface elements provided by the
>       fields pragma will provide protection from any future changes

In addition to such faint praise, I'm also seeing damnations, such as
the Perl6 RPC "Pseudo-hashes must die!" and

Matt Sergeant <matt@sergeant.org> Thu, 8 Jun 2000 15:44:04 +0100 (BST)
> Psuedo hash references are badly broken even in 5.6. Anyone who's
> done extensive work with them (or tried to) can tell you that.

Which deters. (As does

> Instead, write a class for your objects, and use arrays internally.
> Define constants for the indexes of the arrays.

which appears laziness-deficient :-)

I'm also _not_ seeing messages of the form, "Yes, we used phashs to
implement our telepathic subsystem, which services 4.2 zillion users
every day. We love them."

Being an empiricist (and a wimp :-), I'd like to know:

* Is anyone out there using pseudo-hashes in production code under
  mod_perl?

* Is anyone now using (under mod_perl) something they consider to be
  superior but with similar functionality and interface?

If possible reply directly to me as well as the list (I'm digesting),
and TIA, Tom_Roche@ncsu.edu

===

To: "Tom_Roche@ncsu.edu" <Tom_Roche@ncsu.edu>
From: Matt Sergeant <matt@sergeant.org>
Subject: Re: pseudo-hashes? was: Data structure question
Date: Mon, 22 Jan 2001 23:06:47 +0000 (GMT)

On Mon, 22 Jan 2001, Tom_Roche@ncsu.edu wrote:

Well you've already seen I'm a detractor :-)

> * Is anyone now using (under mod_perl) something they consider to be
>   superior but with similar functionality and interface?

Yes, a class which is a blessed array.

===

To: Tom_Roche@ncsu.edu
From: Robin Berjon <robin@knowscape.com>
Subject: Re: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 01:00:35 +0100

At 18:05 22/01/2001 -0500, Tom_Roche@ncsu.edu wrote:
>the Perl6 RPC "Pseudo-hashes must die!" and

And indeed, they ought to die. Or be reimplemented. Or something, but quite
simply, don't use them. They'll break, they won't dwim, and chances are
they won't play nice with future/past versions of Perl. Forget they even exist.

As Matt says, array based objects are much much better, and do what you
want them to do. You seem to be deterred by the laziness factor. Not so
much of a problem ! You could use enum, but it has constraints on the names
you can use which I don't like. You also probably don't need all that it does.

Following is a small class that I've been using in one of my projects. You
can use it in two ways:

If you are not extending a class that uses an array based object, simply
define the fields and use them:

package MyClass;
BEGIN { use Tessera::Util::Enum qw(FOO BAR BAZ); }

sub new {
  my $class = shift;
  return bless [], $class;
}

sub foo {
  my $self = shift;
  return $self->[FOO]; # fetch what's at index FOO
}

Sounds simple enough right ? The problem with array based objects is that
generally they can't be extended. That is, if you have a subclass of an
array based class it's a pain to add new fields because you never know if
your base class might add new fields, and thus break your index. That's one
reason why hashes are still used so much. With my class, you can do (in
your subclass):

package MyClass::Subclass;
use base MyClass;
use Tessera::Util::Enum;

BEGIN { 
  Tessera::Util::Enum->extend(
                                         class  => 'MyClass',
                                         with  => [qw(
                                                         NEW_FIELD
                                                         OTHER_NEW
                                                       )],
                                           );
}

sub get_new_field {
  my $self = shift;
  return $self->[NEW_FIELD];
}

and it will just work. One thing you can't have is multiple inheritance
(well, you can choose to extend just one of the parent classes). I've been
using this quite extensively in a system of mine, and I've been quite happy
with it. It does reduce memory usage in a DOM2 implementation of mine. Of
course, you can change the class name as it won't mean anything outside my
framework, tweak it, throw it out the window, etc...

Perhaps I should put it on CPAN if there's interest in such things (and no
such module is already there).

###
# Tessera Enum Class
# Robin Berjon <robin@knowscape.com>
# 03/11/2000 - prototype mark V
###

package Tessera::Util::Enum;
use strict;
no strict 'refs';
use vars qw($VERSION %packages);
$VERSION = '0.01';

#---------------------------------------------------------------------#
# import()
#---------------------------------------------------------------------#
sub import {
    my $class = shift;
    @_ or return;
    my $pkg = caller();

    my $idx = 0;
    for my $enum (@_) {
        *{$pkg . '::' . $enum} = eval "sub () { $idx }";
        $idx++;
    }
    $packages{$pkg} = $idx; # this is the idx of the next field
}
#---------------------------------------------------------------------#


#---------------------------------------------------------------------#
# extend(class => 'class', with => \@ra_fieldnames)
#---------------------------------------------------------------------#
sub extend {
    my $class = shift;
    my %options = @_;
    my $pkg = caller();

    warn "extending a class ($options{class}) that hasn't yet been defined"
        unless $options{class};

    my $idx = $packages{$options{class}};
    for my $enum (@{$options{with}}) {
        *{$pkg . '::' . $enum} = eval "sub () { $idx }";
        $idx++;
    }
    $packages{$pkg} = $idx; # this is the idx of the next field
}
#---------------------------------------------------------------------#



1;
=pod

=head1 NAME

Tessera::Util::Enum - very simple enums

=head1 SYNOPSIS

  use Tessera::Util::Enum qw(
                              _foo_
                              BAR
                              baz_gum
                            );

  or

  use Tessera::Util::Enum ();
  Tessera::Util::Enum->extend(
                              class => 'Some::Class',
                              with  => [qw(
                                            more_foo_
                                            OTHER_BAR
                                       )],
                             );

=head1 DESCRIPTION

This class only exists because enum.pm has restrictions on naming
that I don't like. I also don't need it's entire power.

It also adds the possibility to extend a class that already uses
Enum to define it's fields. We will start at that index.

=head1 AUTHOR

Robin Berjon <robin@knowscape.com>

This module is licensed under the same terms as Perl itself.

=cut

===

To: Tom_Roche@ncsu.edu
From: Perrin Harkins <perrin@primenet.com>
Subject: Re: pseudo-hashes? was: Data structure question
Date: Mon, 22 Jan 2001 16:11:22 -0800 (PST)

On Mon, 22 Jan 2001 Tom_Roche@ncsu.edu wrote:
> (section 4.3, pp 126-135) I hadn't heard about pseudo-hashes. I now
> desire a data structure with non-numeric keys, definable iteration
> order, no autovivification, and happy syntax. (And, of course,
> fast-n-small :-) Having Conway's blessing is nice

Pseudo-hashes do not have Conway's blessing.  We hired him to do a
tutorial for our engineers a few omnths back, and he railed about how
disappointing pseudo-hashes turned out to be and why no one should ever
use them.  I had already reached the same conclusion after I saw that
everyone would have to remember to say "my Dog $spot;" every time or the
whole thing falls apart.

If you want something reasonably close, you could do what a lot of the
Template Toolkit code does and use arrays with constants for key
names.  Here's an example:

package Dog;

use constant NAME => 1;
use constant ID   => 2;

sub new {
  my $self = [];
  $self->[ NAME ] = 'spot';
  $self->[ ID ]   = 7; 
  return bless $self;
}

Or something like that, and make accessors for the member data.  I think
there are CPAN modules which can automate this for you if you wish.

===

To: "Perrin Harkins" <perrin@primenet.com>
From: "Ken Williams" <ken@forum.swarthmore.edu>
Subject: Re: pseudo-hashes? was: Data structure question
Date: Mon, 22 Jan 2001 22:39:54 -0600

perrin@primenet.com (Perrin Harkins) wrote:
>On Mon, 22 Jan 2001 Tom_Roche@ncsu.edu wrote:
>> (section 4.3, pp 126-135) I hadn't heard about pseudo-hashes. I now
>> desire a data structure with non-numeric keys, definable iteration
>> order, no autovivification, and happy syntax. (And, of course,
>> fast-n-small :-) Having Conway's blessing is nice
>
>Pseudo-hashes do not have Conway's blessing.  We hired him to do a
>tutorial for our engineers a few omnths back, and he railed about how
>disappointing pseudo-hashes turned out to be and why no one should ever
>use them.  I had already reached the same conclusion after I saw that
>everyone would have to remember to say "my Dog $spot;" every time or the
>whole thing falls apart.

At the last YAPC he talked about the various unsatisfactory approaches
and finally seemed to advocate for his Tie::SecureHash module.  Among
other things, it allows '__private', '_protected', and 'public' data
members.  I'm not sure whether it supports explicit declarations of key
names, but I bet it could be added easily if not.

I haven't used the module, but wanted to pass along the info.

====

To: "Robin Berjon" <robin@knowscape.com>,
<Tom_Roche@ncsu.edu>
From: "John Hughes" <john@Calva.COM>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 10:36:34 +0100

> And indeed, they ought to die. Or be reimplemented. Or something, 
> but quite simply, don't use them. They'll break, they won't dwim,
> and chances are they won't play nice with future/past versions of
> Perl. Forget they even exist.

Details?

I'm using them with no problems in 5.005_03 (the real "last stable"
version) with no problems.

exists doesn't do what you think, that's the list of problems.

===

To: "Perrin Harkins" <perrin@primenet.com>,
<Tom_Roche@ncsu.edu>
From: "John Hughes" <john@Calva.COM>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 10:43:25 +0100

>  I had already reached the same conclusion after I saw that
> everyone would have to remember to say "my Dog $spot;" every time or the
> whole thing falls apart.

Falls apart?  How?

> If you want something reasonably close, you could do what a lot of the
> Template Toolkit code does and use arrays with constants for key
> names.  Here's an example:

Yes but then you get neither compile time (my Dog $spot) nor run time
(my $spot) error checking.

How are you going to debug the times you use a constant defined for
one structure to index another?

Have fun.

Oh, do it all through accessor functions.  That'll be nice and
fast won't it.

===
To: John Hughes <john@Calva.COM>
From: Matt Sergeant <matt@sergeant.org>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 10:06:13 +0000 (GMT)

On Tue, 23 Jan 2001, John Hughes wrote:

> > And indeed, they ought to die. Or be reimplemented. Or something, 
> > but quite simply, don't use them. They'll break, they won't dwim,
> > and chances are they won't play nice with future/past versions of
> > Perl. Forget they even exist.
> 
> Details?
> 
> I'm using them with no problems in 5.005_03 (the real "last stable"
> version) with no problems.
> 
> exists doesn't do what you think, that's the list of problems.

Neither does delete. And overloading doesn't really work properly. And
reloading modules with phashes doesn't work right. And sub-hashes doesn't
work right ($pseudo->{Hash}{SubHash}). And so on...

All they do is hide a multitude of sins, for very little real world
gain. Try it - convert your app back to non-pseudo hashes and see what
performance you lose. I'm willing to bet its not a lot.

The only gain might be in a large DOM tree where there may be thousands of
objects. But then you're really better off using an array based class
instead (as I found out).

===

To: John Hughes <john@Calva.COM>
From: Matt Sergeant <matt@sergeant.org>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 10:17:27 +0000 (GMT)

On Tue, 23 Jan 2001, John Hughes wrote:

> >  I had already reached the same conclusion after I saw that
> > everyone would have to remember to say "my Dog $spot;" every time or the
> > whole thing falls apart.
> 
> Falls apart?  How?

Because you miss one out and its a very difficult to find bug in your
application, mostly because you don't get the compile warnings if you miss
one off, but also you end up wasting time looking for why your application
really isn't any faster (the hint here is that pseudo hashes really don't
make that much speed difference to your application).

Say you miss off a type declaration, and later decide to change your hash
key. All of the declarations with types will produce compile errors, so
you can/will fix them, but the one you missed it from will lie hidden,
never producing an error even when the code is called.

> > If you want something reasonably close, you could do what a lot of the
> > Template Toolkit code does and use arrays with constants for key
> > names.  Here's an example:
> 
> Yes but then you get neither compile time (my Dog $spot) nor run time
> (my $spot) error checking.

Why not?

Witness:

% perl -Mstrict
use constant FOO => 0;
my @array;
$array[FOD] = 3;
Bareword "FOD" not allowed while "strict subs" in use at - line 3.

Seems like compile time checking to me...

> How are you going to debug the times you use a constant defined for
> one structure to index another?

You use packages, and data hiding.

> Oh, do it all through accessor functions.  That'll be nice and
> fast won't it.

Maybe faster than you think. Your bottleneck is elsewhere.

If you are really going: 

my Dog $spot = Dog->new("spot");
print "My Dog's name is: ", $spot->{Name}, "\n";

Then I think many people here would think that is a very bad
technique. You should *never* be able to make assumptions about the
underlying data format of an object.

===

To: "Matt Sergeant" <matt@sergeant.org>
From: "John Hughes" <john@Calva.COM>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 11:36:54 +0100

(exists doesn't work).

> Neither does delete.

Ok.  But what should it do?  What does it do for an array?

> And overloading doesn't really work properly.

Details?

> And reloading modules with phashes doesn't work right.

I steer clear of reloading, almost anything screws up.

> And sub-hashes doesn't work right ($pseudo->{Hash}{SubHash}).

Details?  Works for me.

> And so on...

> All they do is hide a multitude of sins, for very little real world
> gain. Try it - convert your app back to non-pseudo hashes and see what
> performance you lose. I'm willing to bet its not a lot.

Well, obviously.  Hashes aren't slow.  But they are *BIG*.

===
To: John Hughes <john@Calva.COM>
From: Matt Sergeant <matt@sergeant.org>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 10:42:38 +0000 (GMT)

On Tue, 23 Jan 2001, John Hughes wrote:

> (exists doesn't work).
> 
> > Neither does delete.
> 
> Ok.  But what should it do?  What does it do for an array?

But we're talking about hashes! At the very least it should make it so
that exists() returns false.

> > And overloading doesn't really work properly.
> 
> Details?

Overloading was the wrong word, FWIW... What I meant was, it doesn't work
right if you subclass a module using @ISA = (...) rather than use base. So
everybody has to *know* the underlying implementation of your class
anyway, so that breaks the very concept of OO/Data Hiding.

> > And reloading modules with phashes doesn't work right.
> 
> I steer clear of reloading, almost anything screws up.

Thats an overstatement in the extreme. Reloading works fine for a great
many people, and most modules.

> > And sub-hashes doesn't work right ($pseudo->{Hash}{SubHash}).
> 
> Details?  Works for me.

SubHash isn't compile time checked! You need to do:

my SubH $subhash = $pseudo->{Hash};
$subhash->{SubHash};

to get the compile time checking.

> > All they do is hide a multitude of sins, for very little real world
> > gain. Try it - convert your app back to non-pseudo hashes and see what
> > performance you lose. I'm willing to bet its not a lot.
> 
> Well, obviously.  Hashes aren't slow.  But they are *BIG*.

??? How many keys are in your pseudo hashes? I'm willing to bet not that
many. The difference is probably less than you think to your particular
application. That is unless its a huge set of objects (thousands).

===

To: "John Hughes" <john@Calva.COM>
From: Robin Berjon <robin@knowscape.com>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 11:47:22 +0100

At 11:36 23/01/2001 +0100, John Hughes wrote:
>> Neither does delete.
>
>Ok.  But what should it do?  What does it do for an array?

perldoc -f delete

"In the case of an array, if the array elements happen to be at the end,
the size of the array will shrink to the highest element that tests true
for exists() (or 0 if no such element exists)."

Pretty much what one would expect.

>> All they do is hide a multitude of sins, for very little real world
>> gain. Try it - convert your app back to non-pseudo hashes and see what
>> performance you lose. I'm willing to bet its not a lot.
>
>Well, obviously.  Hashes aren't slow.  But they are *BIG*.

That's why arrays are so cool. And there are many tricks to make them work
pretty much the way you'd expect a hash to work, with very few limitations.
I also have a mind to try and play with use overload '%{}' on an array
based object to see if interesting stuff could be done there. It'll be
slower of course, but it could perhaps beat a tied hash (ties asre awfully
slow).

===
To: "modperl@apache.org" <modperl@apache.org>
From: DeWitt Clinton <dewitt@avacet.com>
Subject: Re: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 07:48:01 -0500

On Tue, Jan 23, 2001 at 10:06:13AM +0000, Matt Sergeant wrote:

> The only gain might be in a large DOM tree where there may be
> thousands of objects. But then you're really better off using an
> array based class instead (as I found out).

This is getting a bit off-topic, but I'm empirically found that the
DOM is not necessarily the best object model to use in a mod_perl
environment.  XML::DOM in particular has such a high overhead in terms
of memory (and memory leaks) and performance, that it is sometimes
inappropriate for a context that requires a small footprint, and
generally fast throughput (like mod_perl).

For example, in version 1 of the Avacet perl libraries, we were using
XML::DOM for both our XML-RPC mechanism and as the underlying data
structure for object manipulation.  In version 2, however, we created
an architecture that automatically converts between the language
agnostic XML and native blessed objects using a custom engine built on
the PerlSAX parser.  This reduced our memory footprint dramatically,
stopped up the memory leaks, and increased performance significantly.
Moreover, the object model now exposed is based on native perl objects
with an API geared toward property manipulation (i.e., get_foo,
set_foo) which is easier to program directly to than the DOM.

You can see this in action with the modules available in the
Avacet::Core::Rpc::Xml namespace at www.avacet.com.  

===

To: Robin Berjon <robin@knowscape.com>
From: Matt Sergeant <matt@sergeant.org>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 12:50:23 +0000 (GMT)

On Tue, 23 Jan 2001, Robin Berjon wrote:

> At 11:36 23/01/2001 +0100, John Hughes wrote:
> >> Neither does delete.
> >
> >Ok.  But what should it do?  What does it do for an array?
>
> perldoc -f delete
>
> "In the case of an array, if the array elements happen to be at the end,
> the size of the array will shrink to the highest element that tests true
> for exists() (or 0 if no such element exists)."
>
> Pretty much what one would expect.

Thats only 5.6+ though. So its only useful for internal applications (if
at all).

===

To: Matt Sergeant <matt@sergeant.org>
From: Robin Berjon <robin@knowscape.com>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 14:13:21 +0100

At 12:50 23/01/2001 +0000, Matt Sergeant wrote:
>Thats only 5.6+ though. So its only useful for internal applications (if
>at all).

True, but we've been using 5.6 (built from AS source) in production for
quite a while now very happily. Also, I'm seeing more and more customers
having it or ready to upgrade. Doesn't make delete @array that much more
useful, but there's hope.

===

To: John Hughes <john@Calva.COM>
From: Perrin Harkins <perrin@primenet.com>
Subject: RE: pseudo-hashes? was: Data structure question
Date: Tue, 23 Jan 2001 11:53:30 -0800 (PST)

On Tue, 23 Jan 2001, John Hughes wrote:
> >  I had already reached the same conclusion after I saw that
> > everyone would have to remember to say "my Dog $spot;" every time or the
> > whole thing falls apart.
> 
> Falls apart?  How?

If you forget the "Dog" part somewhere, it's slower than a normal hash.

> > If you want something reasonably close, you could do what a lot of the
> > Template Toolkit code does and use arrays with constants for key
> > names.  Here's an example:
> 
> Yes but then you get neither compile time (my Dog $spot) nor run time
> (my $spot) error checking.

As Matt pointed out, you get compile time errors if you use an undefined
constant as a key.

You can also do this sort of thing with hashes, like this:

use strict;
my $bar = 'bar'
$foo{$bar};

If you type $foo{$barf} instead, you'll get an error.

> How are you going to debug the times you use a constant defined for
> one structure to index another?

Different classes would be in different packages.

> Oh, do it all through accessor functions.  That'll be nice and
> fast won't it.

Well, I thought we were talking about data structures to use for objects.

A few months back, when making design decisions for a big project, I
benchmarked pseudo-hashes on 5.00503.  They weren't significantly faster
than hashes, and only 15% smaller.  I figured they were only worth the
trouble if we were going to be making thousands of small objects, which is
a bad idea in the first place.  So, we opted for programmer efficiency and
code readability and wrote hashes when we meant hashes.  Of course, since
this stuff is OO code, we could always go back and change the internal
implementation to pseudo-hashes if it looked like it would help.

If pseudo-hashes work for you, go ahead and use them.  If it ain't
broke...

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu