The Joys of RCS (with vc.el)

Okay, you've probably heard the spiel about how version control is so cool that you should use it for *everything* on your box, not just for big projects worked on by large groups of people.

Personal Version Control

If you're like me, this "personal version control" concept works out something like this: "Okay, what system should I use? Well, everyone knows that RCS is old-fashioned, and out-of-date. CVS was the next generation, but that's been getting a lot of criticism lately. But then it still is pretty widely used, and there are so many different competitors... Eh, I'll try CVS. I'm not likely to need any really fancy features. And maybe I should learn something about administering CVS, anyway".

So you hassle with CVS: you get it installed, you figure out the arcane art of creating a central repository (the obvious docs do not at all make it clear what the different parameters actually do, so you have to mess with it a few times), then you get to noodle around your strategy about exactly which locations you're going to put under CVS ("everything" is pretty clearly an absurdly anal-retentive answer... if you've got a few gigs of MP3s that you just listen to, putting them under version control would be pretty useless), maybe you then look into different front ends to CVS to avoid doing a lot of manual check-in and check-out commands, then you're ready to settle down to work, feeling virtuous and studly because any project you work on is now "under control" --

Then you decide you want to re-name some files. Maybe you want to re-structure a project, moving whole directories around. How do you do that with CVS? Answer: you're supposed to edit the repository directly.

But the real answer though -- if you're someone like me -- is to throw-away your existing repository, re-organize things, then just create a new repository. You lose your pain-stakingly created file histories, but how much hassle is it worth to preserve them? After all, you've got other things to do, this CVS management thing is just a sideline for you...

Conclusion: CVS really does suck. And it doesn't just suck for big projects (where the branching and tagging features are known to be pretty clunky), it sucks for small projects too.

Version Control Front-ends (warning: emacs evangelism in disguise)

A brief digression that isn't really. When I mentioned "version control front-ends", what was I talking about? I was talking about an "IDE" of course. In particular, I was talking about the world's first IDE, the world's most flexible IDE, which works with the widest assortment of languages: emacs. (And for me, this means "Gnu emacs", but should you be a fan of xemacs, you will get no disrespect from me.)

I was actually mildly confused about how to do this with emacs, because I was someone who'd been using the proprietary "perforce" version control system at work, and that has it's own elisp package to integrate it with emacs, "p4.el". So when I wanted to experiment with using CVS on my own (I'd worked at places that use it, of course, but never tried to use it from within emacs), I went looking for a "cvs.el" package to work with it. As it turns out, what you really want is "vc.el". Emacs culture being what it is, they attempted to write one general front-end elisp package that you could use with different back-end version control systems. As is often the case however, this idea met with mixed success, and the only version control systems supported by "vc.el" are CVS and RCS (oh, and SCCS, about which I know very little -- and this list of supported vc systems is constantly exapnding -- November 25, 2007).

But okay, vc.el is the way to go, and it's well documented in the emacs manual (the on-line version of which is just an "M-x help i m emacs <RET>" away. It actually turns out to be pretty convienient to use. The common comands all use the "C-x v" prefix, and mostly you just use "C-x v v" which makes a guess about what you're likely to want to do next and does it. Need to check-out a file? "C-x v v". Ready to check it in? "C-x v v". (Which then opens up a temporary emacs buffer to prompt you to type in a log message.)

Stumbling into RCS

And if you want to put a file under version control that isn't already, what do you do? "C-x v v", of course.

Then, one fine day I was working in a directory that I thought I had put under CVS control, but hadn't... and when I did a "C-x v v" to add a file to the repository, I got a strange prompt "Create RCS repository?". I said yes to this without thinking about very much (it used to be that CVS was implemented on top of RCS, and I had a vauge thought that maybe my CVS still needed to do some RCS stuff).

It gradually dawned on me that "vc.el" defaults to using RCS if it can't find a CVS sub-directory in the current location. But I just kept working that way "just for now", figuring I'd move the stuff from RCS to CVS later.

But really, I didn't need all that much in the way of features from my version control, and the RCS stuff was working fine. I looked into what it was doing, and I realized that it was using distributed repositories... any directory where you put a file under RCS, it would create a sub-directory called "RCS", and that sub-directory was actually the repository. Typically with CVS, the "CVS" sub-directory just contains a few files full of meta-information that point at the actual central repository.

And a little while later, it dawned on me that these distributed RCS repositories were actually tremendously convenient. You want to restructure a project? If you move a directory around, you carry the RCS repository with it: no problem. And there's an emacs command "M-x vc-rename-file" that let's you rename a file and also update it's name in the repository without losing file history. My biggest annoyance with CVS completely disappeared with RCS.

And the barrier to entry is tiny! If you're using emacs already, any file that you think might deserve verson control gets it immediately, the moment you do a "C-x v v" on that file. There are no set-up hassles. You don't even need to waste any energy thinking about your "version control strategy" if you don't feel like it. You can make the decision about what to add to RCS on the fly, on a file-by-file basis.

(Oh, and how about when you want to hand-off your project to someone else? You can just tar up the source tree along with the RCS repositories and send it out along with the file history.)

The true purposes of version control

Stupid question: Why do you use version control? Standard answer: So you can get back an older version of a file if you need it.

Pfft. You never want to do that. (Well, almost never.) And besides, if you did need to get an old version of something, you could get it off of your backups. (You do have backups, right?)

What you really want version control for is to get up the courage to rip apart code that's working already so that you can improve it, without being afraid that you're going to totally screw up and lose a version that sort of works. If you've ever worked without version control as a safety net, you know the kind of things you end up with. Any development directory ends up littered with manual backups like:

   some_program.c.old
   some_program.c.older
   some_program.c.not_so_old
   some_program.c.ok
   some_program.c.sorta_ok
   some_program.c.ng
   some_program.c.ngier
   some_program.c.almost_not_ng
   some_program.c.getting_there

The first advantage to using version control is you get rid of this litter, and you stop having to think of cute, distinct names for the extensions on your back-ups. Any time you're ready to make a move, a few keystrokes lets you feel safe enough to do it.

But after a while, other advantages turn up:

You stumble across a file you checked out a few days ago, and have trouble remembering what you were doing to it. Solution: "C-x v =" displays the diff between this version and the previously checked-in version.
You look at your todo list, and can't remember if you've already made the changes to a file it refers to. Solution: "C-x v l" displays the change-log for the current file.
Instead of writing notes about what you've been doing in random places, the change-log for each file becomes a great place to enter that information...

The difference in perspective: individual and group

Working as an individual on your own, you may find yourself making many different file check-ins for any small change you feel like doing.

Working in a group, on a project under version control, you'll probably find some subtle (and often overt) pressure against doing quite so many check-ins. For example, your fellow workers are probably getting mail every time you check something in, and you'd probably rather not spam them with lots of minor changes.

It gets particularly bad in the case of non-working code. Say you've checked-out a file, and started working toward a goal, but half-way there you realize that the direction you're going is harder than you thought. You start wondering if it might be simpler to have gone in a different direction. What you personally would like to do at this point is to check in the current, broken version, so that you can try a different way without losing what you've done. In many work environments, though, that would be a major no-no ("Don't break the build!").

One solution would be to have two different version control systems running in parallel. You use your "personal" system for fine-grained changes, but periodically do check-ins to the central repository when you're ready to give something to the group.

Check the section of the emacs manual called: "Local Version Control", which explains in detail how to use RCS for personal version control in a group that's using CVS. There are vc.el commands to let you toggle the backend that is currently in use.

And -- as is increasingly likely these days -- if your group is using something besides CVS, it could be that vc.el won't know anything about it. In that case you can just use vc.el to talk to RCS, and most likely you'll have an entirely separate command set for public check-ins.

The pain of using RCS (if you're not using emacs)

The vc.el package starts looking really impressive when you decide to try and do something with command-line RCS. Suddenly, DWIM is gone out the window, and you're back in the usual world of clunky things that make sense only after extensive study.

Suppose you want to know how to check-in a bunch of files from the command line. I don't see any features in vc.el to let you add an entire directory to RCS in one shot (though there could be something buried in vc-dired that I'm missing). Now I could easily write keystroke macros to crunch through a dired buffer doing "C-x v v" on each file, but the command-line approach would seem to be the Right Way. So off we go into the man page fandango.

First, man rcs. Hm, a command named "ci". Ah ha "check in RCS revisions". And also a mention of a "co" command, and I bet I know what that means. Ah, there's an "rcsintro" (um... well it certainly could be worse, but somehow the material there didn't click with me when I first looked at it). Eventually I gave up and turned up an RCS tutorial on the web.

Experimenting with "ci *" in a test directory, it did not know enough to create an RCS sub-directory: It transforms every file in current directory into a *.v file. Huh? If you manually create an RCS sub-directory first, "ci *" does indeed find it, but it moves all files into the repository. You don't have a copy left in the original location by default (Evidentally, they took the "library" metaphor a little too seriously: when you do a check-in you gotta give the book back, right?).

Okay, so after the ci, you need to do an explicit co to check them out again-- and that looks like it has to be really explicit, because "co *" obviously can't do anything useful given the way the shell works. Oh, but it does understand the *,v form of a file name. This is helpful, because it let's you do things like:

  ls RCS | xargs co

(Um, but "co somefile", just checks it out in a read-only state. What if you wanted to be able to edit it? Oh, never mind. You use emacs.)

Well, actually the -l option to open it in "locked" mode seems to be the key (see the next section).

Using perl on RCS files

One more problem. Suppose you need to automatically munge some files that you've decided should be under RCS control. One method of dealing with that is the perl CPAN module "Rcs". A quick synopsis:

   use Rcs;
   Rcs->bindir('/usr/bin');       # Have to tell it where to find the rcs tools
   Rcs->quiet(0);                 # Turn off quiet mode
   $rcs = Rcs->new;               # Because a day without objects is a day without job security
   $rcs->workdir("$location");    # Have to tell it where the files are...
   $rcs->rcsdir("$location/RCS"); # And need to specify that we're using a local RCS repository

Before you can do an "open" for output, you need to do a "co", with the "-l" option, so that the file will be "locked" (i.e. not read-only):

   $rcs->file("somefile.txt");
   $rcs->co('-l');  # check out and lock file (or else it will be read-only)
   open(OUTIE, "> $location/somefile.txt")
          or die "Can't open $location/somefile.txt for output: $! \n";

When you're done, close the file handle, and check in the file (using the -u option so that there will still be a read-only copy sitting there as you expect). Using a suitable log message (via -m) is of course a good idea:

   close OUTIE;
   $rcs->ci('-u', "-m automatic update by $0");  # check-in and check-out in "un-locked" state
                                                 # or else file will evaporate.

But I must say, as time goes on it's my biggest source of dissatisfaction with RCS: the fact that a checked-in file is read-only. -- November 25, 2007

Appendix 1: What is version control?

Actually if you haven't even heard of version control, maybe you might want to hear about what it is...

Version Control is an automated method of keeping track of changes to files. A typical file system (these days) doesn't make any effort at doing this for you: when you save a file it blows away the previous version. If you've got the file under version control, then in theory you can always get back an earlier version. Take a look at the section "Concepts of Version Control" in the emacs manual if you need to know more.

Once upon a time, there were operating systems with file systems that would do a simple form of version control for you. For example, under TOPS-20, older versions of the file would still exist, with numbered extensions appended to the name. If you wanted to clean-up the disk, you could always delete them, but by default you would always have the history of a given file called "FILE" as "FILE.1", "FILE.2", "FILE.3"... and so on.

It's entirely likely that some bright person will re-invent this some day (and of course, there are probably still folks out there using "old" systems that do things like this, e.g. VMS). Version 4 of reiserfs strikes me as a good platform for implementing integrated version control.

Appendix 2: Impersonal version control

Okay, we all know CVS is living on borrowed time. What's next?

My opinion (for what it's worth, and I wouldn't bet much on it):

Subversion is a game try, but Gnu arch is going to kick it's ass. The proprietary Bitkeeper will shortly become yesterday's news, because arch implements a big chunk of it's feature set.

Caveat: Vesta sounds interesting (though I wouldn't want to be marooned there). The old unix "make" facility is also looking pretty moldy, in my opinion, and replacing make and cvs together in one shot has it's appeal. The code base for Vesta has been around for awhile, too (it was originally an internally used tool at Digital), so it might have a stability advantage.

Update (January 17, 2010): For awhile there I thought that monotone might take the prize, but it's definitely looking like git is the winner... (but then, git is essentially a re-write of monotone with efficiency paramount: no C++ and no sqllite backend). Many things out there have been written about git, but I like Oliver Steele's My Git Workflow. Great diagrams, and his attitude toward version control is pretty similar to mine.

Appendix 3: A version control administration gotcha for those that can be gotten

Once upon a time, when I first decided to learn something about administering cvs, I somehow came up with the idea that maybe the way to create a cvs repository was to create the location and manually copy the files over into it, before initializing the repository and adding the files into it.

Can you guess what happens when you try to add a cvs repository to itself? First it sees SomeCode.pm, and adds it as SomeCode.pm,v, then it sees SomeCode.pm,v, and adds that file as SomeCode.pm,v,v, and so on, and it chugs away into an infinite loop creating SomeCode.pm,v,v,v,v,v,v,v,v,v, .... Heh. Most amusing. Quite the embarassing mistake. The type of screw-up that a real pro would never talk about, but I can never resist.

So then when I first experimented with adding RCS files to a repository, I was totally bewildered by it's default behavior. You need to manually create the RCS subdirectory yourself? If you don't, it converts all the files in the current directory into *,v files? Jesus, that's weird. What could that be for? Who would ever expect a version control system to work that way?

Oh.

Joseph Brenner, 10 Jun 2004