sfpug-code_archaeology

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

To: sfpug@sf.pm.org
From: zenji@gmx.net
Subject: Re: [sf-perl] Dinner before SFPUG?
Date: Thu, 30 Aug 2001 22:39:06 +0200 (MEST)

> I wasn't able to make it to the meeting, but I am curious about
> how things went.  Anyone?

We traded a lot of horror stories about insane code we had inherited from
other people, and talked about using various tools to analyse control flow, 
from the debugger to various partial solutions people had tried to code on 
their own.  We came out with a few big lessons.  They're a little
pessimistic:

1) Code archaeology is _hard_.

There are no panaceas; there are some basic tools and techniques 
that help, but there's a lot of painstaking work involved, too.

2) Often just documenting the code is a good start.

You will have to read through it and understand it, so you might as well 
share your understanding.  Rich put forth the idea of different levels of 
documentation, where Level 0 is simply readable code, Level 1 is well-
commented code, and higher levels correspond to system design documents,
user documentation, et cetera.

Vicki pointed out that documenting the data structures code expects is
just as important as documenting the code itself, and recommended putting
sample structures in the comments at the top of blocks of code that use 
them.

3) There are no good static analysis tools for Perl.

It would be nice to see all the places where a sub or method is called,
displayed in flowchart or on demand when you mouse over a method, but
there are no tools that do this.  For one thing, Perl allows for run-
time evaluation of code, so such tools would have to be limited.  Worse,
Perl is so syntactically rich that it is difficult to parse.
An oft-repeated observation was "The only thing that can parse Perl is
perl [the interpreter]."

4) Re-writing code is a perfectly legitimate solution.

It often proves faster than trying to understand existing code.  It's
sometimes easier to understand what a piece of code is supposed to do
than to understand what it actually does.

The tricky part is figuring out when you have an isolated chunk of code 
that can be replaced safely, without breaking the rest of the system.  What
if it doesn't do what it is supposed to do, but other pieces of code
depend on its broken behavior?  You can end up putting your arms around
bigger and bigger parts of the system, trying to find an independent unit
to replace.  Really bad code is often the least modular, so you may be
out of luck in this respect.

2) The bad systems that require code archaeology are the result of 
sociological problems, not technological ones.

The two major problems are duplication and layering of code.  Layering 
occurs when an individual programmer cannot understand a piece of code 
needed for an application, and just writes to it anyway, hacking away until 
the errors seem to stop.  Sometimes this means writing yet another level
of indirection around a library; sometimes it means cutting and pasting 
code, then modifying it for the situation (thereby trashing any chances for 
generality!)  Layering has the nasty property of being self-perpetuating:
The greater the accretion of cruft in the program, the harder it is for
anyone to understand it, let alone change it safely.  Without understanding,
the temptation to add yet another layer is great.

Duplication, related to layering but more widely recognized, happens when
various programmers solve the same general problem with different pieces
of code (usually, in the Perl case, writing their own modules).  In
particularly egregious cases, various parts of the code do the same thing,
overriding each other in hard-to-predict ways.  Matt mentioned a past Web 
programming job where they constantly had to play "Find the Header", since 
the template library auto-generated HTTP headers, but various pieces of
code also manipulated them directly, either before or after the template
was applied.

The proactive solution to both problems is good communication.  This can 
mean more consistency in choosing libraries and more frequent code reviews,
but neither of these solutions is sufficient on its own.  More important
is changing in the mentality of programmers who think they should work in 
isolation and never discuss the problems they are solving.  If programmers 
talk regularly to explain the designs they are developing, they can
recognize common needs and come up with general solutions (avoiding the
duplication problem).  Likewise, they can actually keep some understanding 
of each other's code (heading off the layering problem).  This doesn't
mean understanding all the implementation details; it means understanding
the API (after making sure there is a defined API at all!)

Code archaeologists seldom have the luxury of taking proactive measures;
they are called in to clean up the mess after the classic mistakes have 
already been made.  On the other hand, if you're working more generally as 
a programmer, you can  recognize the signs of layering and duplication and 
know to go into archaeology mode before things get worse.  While you do, 
you should advocate for better communication processes to prevent the 
problem from happening again.

And, yes, there was enough pizza. :)
--Q

===

To: sfpug@sf.pm.org
From: zenji@gmx.net
Subject: [sf-perl] Running the debugger non-interactively
Date: Thu, 30 Aug 2001 22:51:10 +0200 (MEST)

In the meeting, I mentioned that you could use the debugger non-
interactively to print a stack trace of executing code.  The way to
do this is to set the environment variable $PERLDB_OPTS.  It's a
space-separated list of options; you want something like "NonStop frame=2",
where NonStop sets non-interactive mode and frame specifies a level of
detail at which to print stack traces.  From the perldebug man page:

     `frame'     Affects the printing of messages upon entry and
                 exit from subroutines.  If `frame & 2' is false,
                 messages are printed on entry only. (Printing on
                 exit might be useful if interspersed with other
                 messages.)

                 If `frame & 4', arguments to functions are
                 printed, plus context and caller info.  If
                 `frame & 8', overloaded `stringify' and `tie'd
                 `FETCH' is enabled on the printed arguments.  If
                 `frame & 16', the return value from the
                 subroutine is printed.

                 The length at which the argument list is
                 truncated is governed by the next option:

Another useful options (again, from the man page):

                 `maxTraceLen'
                 Length to truncate the argument list when the
                 `frame' option's bit 4 is set.

You can also use a .perldb file to set options.
 --Q

===

To: sfpug@sf.pm.org
From: Peter Prymmer <pvhp@best.com>
Subject: Re: [sf-perl] Dinner before SFPUG?
Date: Thu, 30 Aug 2001 14:58:14 -0700 (PDT)

On Thu, 30 Aug 2001 zenji@gmx.net wrote:

> > I wasn't able to make it to the meeting, but I am curious about
> > how things went.  Anyone?
> 
> We traded a lot of horror stories about insane code we had inherited from
> other people, and talked about using various tools to analyse control flow, 
> from the debugger to various partial solutions people had tried to code on 
> their own.  We came out with a few big lessons.  They're a little
> pessimistic:
> 
> 1) Code archaeology is _hard_.
> 
> There are no panaceas; there are some basic tools and techniques 
> that help, but there's a lot of painstaking work involved, too.

[snip]

> And, yes, there was enough pizza. :)

That was an excellent summary of the discussion.  Thanks.  

One tool that was mentioned that only a few people seemed to have heard of
was the perl reformatter/pretty printer called perltidy.  It is currently
a sourceforge project accesible via:

   http://perltidy.sourceforge.net/

While it is not a static analysis tool, it seemed to be well liked by
those who had mentioned it or used it.

Peter Prymmer

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu