Elisp for Perl Programmers: Data Structures

I've been meaning to add more data structure examples to my perl-elisp rosetta:
elisp-for-perl-programmers. I keep holding off, because there are a lot of oddities on both sides, and it's difficult to do the comparison, but let me get started and see how far I get:

Hierarchial Documentation and length

The emacs lisp reference manual is actually pretty clear on this subject, but there's a lot of stuff to know, and it's eminently logical organization may get in the way sometimes.

For example, the first few times through my eyes glazed over on the Venn diagram in the section Sequences, Arrays, and Vectors:

               _____________________________________________
              |                                             |
              |          Sequence                           |
              |  ______   ________________________________  |
              | |      | |                                | |
              | | List | |             Array              | |
              | |      | |    ________       ________     | |
              | |______| |   |        |     |        |    | |
              |          |   | Vector |     | String |    | |
              |          |   |________|     |________|    | |
              |          |  ____________   _____________  | |
              |          | |            | |             | | |
              |          | | Char-table | | Bool-vector | | |
              |          | |____________| |_____________| | |
              |          |________________________________| |
              |_____________________________________________|

Then later, while I was blundering around trying to learn how to use strings, I couldn't figure out how to get the length of a string. I went back and forth through the sections that discuss strings, but there didn't seem to be a function documented to do it. The gotcha, in this case, is that a String isa Array isa Sequence. There's a general purpose function to get the length of a sequence, it's called, amazingly enough length, and it's documented early in the manual, far ahead of where I was looking: Sequences (not far after the diagram I just quoted above).

Now, the fact that elisp's length works at a higher level of abstraction is actually very cool, this is one of the things that elisp got right and that perl got wrong. In elisp, you get the length of a list the same way you get the length of a string. In perl to get the length of an array you'd use scalar, or the even odder sigil combo $#, neither of which work on a string.

But the problem I had finding length in the documentation is not at all unusual, and I've come to the conclusion that it's a difficulty with all hierarchical structures: the DRY rule requires you talk about common features in the upper levels, but the stuff that you actually need to use at first is down at the bottom. So, you're never told what you need to know where you need to know it, which is why there's room in the world for tutorials, how-tos and cookbooks, and silly web pages like this.

In keeping with my perl focus, I could keep talking about strings, but as you might expect they're already pretty well covered here: elisp-for-perl-programmers. So, how about arrays?

perl arrays and elisp lists

Arrays are one of perl's basic data structures, and so you could say they're loosely equivalent to elisp's lists. They're both sequentially ordered collections of things, and you can append to them on the fly, dynamically increasing their length. You can access each element fairly easily by a numeric value, indexing from 0:

  $element = $array[ $i ];

  (setq element (nth i list))
Another similarity: Both perl arrays and elisp lists can be treated as LIFO stacks, via push and pop:
  push( @array, $item );
  my $item = pop( @array );

  (require 'cl)
  (push item list)
  (setq item (pop list))
There's a difference here, though: the elisp versions of push/pop act on the low numbered end of a list, and the perl versions act on the high numbered end of an array. So you might think of elisp's push/pop as being closer to perl's shift/unshift.

So then, what would be the elisp expressions that act on the high numbered end of a list? This let's you peek at the last element (without removing it):

  (setq last-element (car (last list)))
Much like:
  my $last_element = $array[ -1 ];
I keep meaning to work out precise elisp analog's for perl's pop/push (which act on the high end). You can fake it with elisp's "append":
perl:
  push( @temp_list, "Z" );

elisp:
  (setq temp-list
    (append temp-list (cons "Z" ())))
But "append" is an odd beast that still manages to surprise me.

There are many differences. For one thing, the elisp function nth has to walk the array from the top, each time, so accesses get slower the further down the list you need to go. Perl arrays have an efficiency advantage, in that respect (but then, elisp has other structures that can do better, see 'arrays' below).

The reason for this is that lisp lists are somewhat odd beasts. Lists are composed of chains of two elements, a payload and a pointer to the next link in the chain. These pairs are called cons cells, and the payloads and pointers go by the famously odd names car and cdr. Every access of a list has to thread it's way through each link of the chain, (which is one reason, no doubt, that elisp has a plethora of other types of sequences). The fact that lisp has the guts of this linked-list structure exposed creates many little quirks you'll need to deal with (e.g. the last function does not return the last value of the list -- which is what you always want -- but rather the last cons cell, which means you need to run it through car every single time you use it).

Keep this in mind the next time someone wants to hassle you about perl5 references.

perl arrays vs. elisp arrays

Let's back-up and talk about elisp arrays. I tend to talk about perl array's and elisp lists side by side, because they're probably the most "natural" data structures in their respective languages. But elisp has arrays of their own, which can be accessed by a numeric index just like perl's arrays, without the speed penalty of walking a a list from the top.

However, elisp arrays have one bad limitation that perl arrays (and elisp lists) don't have: they're fixed in length. You really need to know how long the array is going to need to be when you create it.

Creating and accessing arrays:


perl:

  my @array = ('one', 'two', 'three', 'four');
  my $item  = $array[ 2 ]; # "three"

  $array[ $i ] = "new value";


elisp:

  (setq elisp-array ["one", "two", "three", "four"];
  (setq item (aref elisp-array 2)) ;; "three"

  (aset elisp-array i "new value")

Let me contrive an elisp example where you need to create an array of a particular length: loading a list into an array (note that there's no real perl equivalent to this):

  (setq elisp-list ("one" "two" "three" "four" "five")
  (setq array-length (length elisp-list))
  (setq elisp-array (make-vector array-length) "")
  (setq i 0)
  (dolist (item elisp-list)
     (setq item (pop elisp-list))
     (aset elisp-array i item)
     (setq i (1+ i))
    )

other things

I'm stopping here for now, not because there aren't other things that can be said.

For example, I might make the point that elisp data structures can typically contain other structures: In a list, the elements can be other lists, which is how you do complex data structures. (This capability was of course hacked into perl5 via the addition of references.)

One thing that elisp got right: the equal function is a built-in, so you can do comparison of complex structures (like Test::More's is_deeply, or the CPAN module Data::Compare's Compare).

Another small thing I like about elisp: features to use lists as sets are built-in. With perl, you need to hack your own, or find a package on CPAN (none have ever made it to core).

Further, I haven't yet talked about perl's hashes which are really tightly integrated into the language and are enormously useful (probably one of the reasons most people don't complain about the lack of set features). Elisp actually does have hashes, but they take a little more thought to use than perl's in part, because they allow the user to choose a hashing function...

Another thing to discuss is alists and plists: alists are used a little more commonly in elisp than hashes (which are at least thought of as a heavier weight structure in the elisp world than in the perl). The interesting thing to me about alists is that while they contain key-value pairs much like hashes, they're ordered, and there's no uniqueness constraint. So you can add a value to an alist that will temporarily take precedence, but you can get the old value back when you remove the new...

And speaking of property lists... text itself in elisp can have properties associated with it, including the text in strings as well as in buffers. This is how the syntax coloring features are managed.

And there are many angles of elisp that I still haven't quite wrapped my brain around. There's something that bugs me about elisp symbols that I can't quite put my finger on, and the macro syntax is severely annoying (I can get a defmacro to do what I want only about half of the time, and it usually isn't worth the time to get there).


Joseph Brenner, Created: June 16, 2010     Revised: August 17, 2010