My Mail Lashup

I keep meaning to document the crazy lashup I use for dealing with email, if only for amusements sake. To put the emphasis on the positive though, let me lead off with a quick listing of "features" that my lashup gives me that I suspect I'd have trouble finding elsewhere:

Multi-stage email refiling: many systems give you a procmail-like ability to refile mail to sub-folders as it comes in. I can refile things at multiple stages, e.g. when it first comes in and also after I've had a chance to skim through it. I do things like put all my personal mail in a high priority folder, and initially put mail from mailing lists into a lower priority folder, which I then clear later, refiling the mail to individual list folders.
All mail is stored as individual text files, hence any system of indexing text files will work (tolerably well) for indexing my mail -- in practice, find-greps work well enough that I haven't bothered with swishe-e or whatever.
I read mail with software that traditionally puts emphasis on keyboard control, so I don't need to mouse around constantly to do the simplest tasks.
My mail is stored on my workstation at home, not on a remote server somewhere else, ala the many web mail solutions. This has a legal advantage that I suspect will loom increasingly large in people's minds (if the government wants to see my mail folders they have to supeona me, not just get a third party to roll-over). The absence of a webmail interface is also an advantage in disguise: if I'm out of the office, I'm out of the office, not constantly checking my email. (It isn't impossible for me to read mail remotely, but the barrier is a little higher: I have to open a terminal window, ssh to my firewall, then ssh to my workstation).
I'm essentially immune to any web-bugs, javascript attacks, etc: my mailer doesn't understand anything but text...

what I actually do

Really, there are two kinds of eccentricities in play here: While I'm happy to talk about my taste in client software that's essentially just the usual software evangelism (albiet for very unusual software). More interesting are the scripts I use to (1) do mail refiling and (2) mail transport: these involve some cute, simple tricks that may have wider application -- and multi-stage mail refiling is an idea worth stealing, even if you have no interest in how I've implemented it.

client software

I use mh mail, reading it inside of emacs via the MH-E interface. While I can write extensions in elisp, my preference is to write them in perl, with an elisp wrapper to make them easier to run from inside of emacs.

I do mail refiling using shell scripts based on the mh command line tools.

Also, I use ReiserFS 3 as my file system, which presumably helps when dealing with the larger folders with a 100,000+ pieces of mail.

mail admin hackery

Over the years, the communications between my workstation and the server where my mail comes in has gotten increasingly weird. They dropped the "apop" interface I used to use, and after a few attempts at understanding fetchmail, I gave up and rolled my own system of getting my mail, based on shell scripts that do remote commands with "ssh -e", and copy files over with "scp".

Then at one point, the local "send" command on my workstation stopped working for me, and I eventually danced around that problem in a similar way, using some perl code that copies the outdoing mail to the server, and then uses the remote "send" command to get it going.

technical details, in outline

mail refiling

The old-fashioned (even by my standards) method of refiling mh mail relies on the command pick, which skims through a mail folder and returns the file names that match certain criteria (e.g. the contents of the mail header fields "From", "Subject", "To", etc.). It then feeds this list into the mh command refile to actually do the job (ah, unix: "do one thing and do it well", burning processes to light our way).

Just to give you the flavor of mh's pick/refile, a simple example to move some items from your "+inbox" to a folder called "+FIRST":

  refile `pick +inbox \
      -from     important-person@big_place.com \
  -or -from     nice-person@smaller_place.org \
  -or -subject  "important topic" \
  ` +FIRST

My current schtick is to do an initial refile in three stages: the first is a "kill file" that moves some select email address to my "+IDJITS" folder, then mailing list traffic is sent to "+SECOND", and finally personal mail from people I know I want to hear from is put in "+FIRST". This uses my "+inbox" largely as a sewer for spam, but it needs to be skimmed for new candidates for my "+FIRST" folder.

Note that this works in an inverse order of priority: voluminous mail from a friend of mine to some mailing list is treated no differently than any other mail on that list -- I have done it the other way around in the past, which can also be interesting ("Oh look, JC is arguing about politics on the sfraves list again").

A second stage refile script then clears "+SECOND" to the individual list folders. At some point I'll probably add another layer to this system, because some mailing lists have a higher priority with me than others, and I probably need a "+SECOND" and a "+THIRD".

By the way, do you notice that I have numbered stages of refiles, and numbered priorities in my folders? These two numbering systems are actually quite different. For example, the primary mail refiling script sends mail to both the "+FIRST" and "+SECOND" folders, and the secondary mail refiling script takes mail from the "+SECOND" folder, and moves it elsewhere. If I implement a "+THIRD" folder, I won't need to add an additional refile script.

mail transport hacks via ssh

Using mh, one traditionally types the "inc" command to move mail from the spool file to individual files in your "+inbox" folder, and in MH-E, when you type "i" it would run "inc".

When I hit "i" it runs an elisp wrapper that then runs a shell script I call "lazymans_fetchmail". That's does something like this:

  ssh -f doom@remote.server_box.org 'inc'

  scp doom@remote.server_box.org:/home/doom/Mail/inbox/* /home/doom/Mail/TEMP &&\
  ssh -f doom@remote.server_box.org 'rm -f /home/doom/Mail/inbox/*'

  # Locally, refile contents of TEMP to the inbox;
  folder +TEMP
  refile all +inbox

The cute thing here is that "ssh -f" trick: you can use this to run a command on a remote box. Then there's the shell trick of joining commands with "&&", so that the mail won't be deleted if there was a problem with copying it to my local machine. That has an odd side-effect, though: if the copy of remote inbox to local TEMP has a burp, later runs may suck down duplicates of older mail. Actually, that's how I find out when something has gone wrong with all this: I don't get any error messages out of it.

By the way: do you see why I copy mail to "+TEMP" and then do a refile to "+inbox"? MH messages are stored in files with numbered names which constantly change as you juggle your mail around. On the remote machine, "inbox/36" may be a totally different file than on the local machine... to cover that, I copy the mail to an empty location, and let "refile" intelligently renumber the incoming pieces of mail.

This, by the way, is probably the silliest feature of MH: numbering the files no doubt simplifies doing sorted listings, but file names really should bear some relationship to file contents.

And the way I do my mail sending (these days) boils down to a few lines of shell (and currently, five lines of perl):

  my $cmd_1 = "scp $file $ssh_connect:$remote_file";
  my $cmd_2 = "ssh -f $ssh_connect 'send $remote_file'";
  unless($DEBUG) {
    system($cmd_1);
    system($cmd_2);
  }

problems, notes for improvements

refiling

The mh pick/refile commands go beyond "quaint" to "primitive" -- long lines with escaped line-breaks are very brittle (make sure there's no whitespace after those escapes, eh?); typos are all too easy to make ("-of" instead of "-or", for example); the regular expressions it uses are crippled (as I remember it, check the problem on that); and arguably it's inefficient (uses too many processes).

My multi-stage refiles require the same data to be entered in multiple scripts -- far better would be to drive the process with an address book database, and automatically generate the refile scripts. It might even be worth re-implementing some or all of the mh commands in perl code: the easiest way to get the perl5 style regexps everyone wants these days is to just use perl5, and I bet it would all run much faster, too.

mail transport

incoming

The biggest problem with my incoming mail is that the server machine has user quotas and a relatively small disk -- if I get some spammy pinhead sending me multiple copies of an uncompressed image, I can run out of room to "inc" my mail. What I do in that case is to copy over the entire mail spool file, and inc it locally on my box, and then zero it out on the server... this is such a common need that I've scripted this process, also.

outgoing

The biggest problem with my system of outgoing mail is that because it happens on the remote side, I often don't see errors if something is going wrong. Also, my local "+outbox" is very out-of-date (the mail I send is accumulating in an "+outbox" on the remote machine). Oh, and there's a subtle problem with my mail aliases as well: I habitually define them in ~/Mail/aliases on my local machine, but since sends happen on the remote machine, it can't see my changes.

The fix for all of these mail transport problems will most likely be to just start Doing it Right: set up an MX record for my home domain(s), and run a local MTA (postfix, most likely, probably not sendmail).

Joseph Brenner, 11 Jun 2009