svlug_bourne_shell_redirection_in_detail

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

From: Seth David Schoen <schoen@loyalty.org>
To: svlug@svlug.org
Subject: [svlug] [schoen@loyalty.org: Re: redirecting STDERR to two places, within a script]

While I'm writing long messages, I thought I'd share this one, an
attempt at explaining Bourne shell redirection in its full generality.
(This might be done better in the original Bourne shell man page or in
a book like _Learning the bash Shell_; I haven't looked at either of
those recently.)

----- Forwarded message from Seth David Schoen <schoen@loyalty.org> -----
[My friend wanted to run "someprog" and to save its standard output and
standard error, but in such a way that he could tell which was which,
and also tell in which order the messages had appeared, relative to
one another.  He decided that his needs would be met if he could get
standard output and standard error both redirected to a file called
"both-log", while _also_ redirecting (just) standard error to another
file called "sterr-only-log".  My solution for this was

someprog 2>&1 >> both-log | tee -a both-log >> stderr-only-log

and he reported that this worked perfectly, but then asked about how
it worked.  I assumed that he and any other reader of this message
understand how "tee" works, but maybe are confused about the
"2>&1 >> both-log | tee" part.]

> I'm still confused by the syntax, but I guess I'll figure it out -- e.g. in 
> the line:
> 
> >someprog 2>&1 >> both-log | tee -a both-log >> stderr-only-log
> 
> 
> I would have wondered: (1) since both-log is getting written to in two 
> places, why doesn't it contain duplicates of all lines that someprog 
> printed to stdout, and (2) since, by the time "tee" is invoked, stderr and 
> stdout have been bundled into one stream, how does it separate them 
> again?

Both of these are based on misconceptions.

(1) both-log is indeed getting written to in two places (by two
processes), but the first is writing only someprog's stdout and the
second is writing only someprog's stderr.

(2) stdout and stderr were never bundled into one stream; stdout went
directly into the file "both-log", and only stderr went into the pipe.

> But I'm probably just reading the line wrong.

Yep.

"2>&1 >> both-log" is very different from ">> both-log 2>&1", which
is probably closer to what you think I'm doing.  The order of
redirections in the shell is very significant, because of how
redirections are implemented.

First of all, from the point of view of a process, there are _just_ a
particular set of file descriptors associated with that process.
These descriptors might be closed (well, technically, a file
descriptor that isn't open doesn't exist, or is not valid) or might be
attached to something; the things that a file descriptor is attached
to are "file-like" things, which could be a file on disk, a "special"
file (a device, including a terminal device), a pipe (sort of a way of
coupling one descriptor with a different descriptor belonging to a
different process, so that what's read by one is read by the other), a
named pipe (like a pipe but it's created in a different way, and so
_appears_ to be a file in the filesystem, except that things "stored"
in it don't persist on disk but are just used to communicate with
another program), a socket (like a pipe but it's bidirectional and
it's often implemented with a network connection), or possibly other
things.

Any file descriptor _could_ be attached to any file-like thing, either
by a program that was written to open that thing, or by a shell doing
"standard I/O" setup before running a child process.  The standard I/O
mechanism is (as I think I said another time) a completely non-binding
but essentially universal convention, where, when a process is
started, if it inherits a file descriptor "0", that descriptor is
supposed to be used by default for any console input operations, if it
inherits a file descriptor "1", that descriptor is supposed to be used
by default for any console output operations, and if it inherits a
file descriptor "2", that descriptor is supposed to be used by default
for any console error reporting operations.  This is the extent of the
stdio descriptor convention, and if you know this much and a few Unix
system calls, you can write useful console utilities even without the
standard C library.

(I wrote a program called "md5tee" which, in an early version, didn't
use the C library at all, for speed reasons.)

OK, so shell redirection is a way of setting up a set of descriptors
for a program.  It is _usually_ assumed that the program will follow
the stdio convention (stdin, stdout, stderr), and shell redirection
notation certainly has a built-in bias in favor of this convention,
although you can actually do odd things like close a program's stdin
before it starts (the redirection "<&-" in the Bourne shell) or give a
program an input descriptor 3 which is reading the password file
("3 </etc/passwd" in the Bourne shell).  If the program is expecting
it, there are perfectly legitimate applications for these weird uses
of redirection.

The typical use of redirection is to "fool" a program into reading,
writing, or both, something other than your terminal (or something
other than whatever is your shell's own default location for input or
output -- so if you write a shell script and then you run it and
redirect output to "foo", but the script itself contains commands to
redirect some output to "bar", that output still goes to "bar").

Now the general form of a redirection is either

"program n>b"

or

"program n<b"

(Note that ">b" always means "1>b", and "<b" always means "0<b".
Similarly, ">>b" means "1>>b", "<<b" means "0<<b", ">&n" means
"1>&n", "<&n" means "0<&n".  So when you say "ls >foo", you are
saying "ls 1>foo".)

Here n is the file descriptor and b is the name of a file.  So this
means "attach the file descriptor number n that program will get to
the file b", and the ">" means "opened for writing", the "<" means
"opened for reading", and then ">>" means "opened for writing and
positioned to the end of the file (to append)".

There is a different form

"program n>&m"

or

"program n<&m"

which has a subtle meaning: "attach the file descriptor n that program
will get to _whatever thing the file descriptor m is currently
attached to_".  It does not mean (as you might think) "attach
descriptor n to descriptor m" or "make descriptor n a copy of m" or
"merge descriptor n into descriptor m".  They are still both
completely independent file descriptors; it's just a shorthand that
says "attach n to what m is attached to at this moment" -- "send n
where m is currently going" or "take n from wherever m is currently
being taken from".

This distinction is important because it means that, if you redirect
"a>&b" and then redirect "b>c", descriptor a is going _where
descriptor b was originally going_, and descriptor b is now going to
file c.  There is no way to merge descriptors, but if you do "a>&b"
or "b>&a" _and then don't redirect either descriptor to some other
place_, descriptors a and b will start off going to the same place.
That does not mean that they are the same descriptor, just that both
descriptors are attached to the same file.

One of the most common redirections is

foo 2>&1

which means "send stderr to where stdout is going" -- you can _think_
of this as merging the streams, but it doesn't conceptually merge
them, it just sends them to the same place, which has the same practical
effect as merging them would, but in the implementation they are still
quite separate.  (Two hoses emptying into the same bucket?  They're
still two separate hoses, and theoretically you _could_ still take one
of the hoses back out of the bucket and point it somewhere else.)

Another is

foo 2>/dev/null

which means "send stderr to /dev/null".

OK, so it's important to understand the effect of the order of these
redirections:

foo 2>&1 >/dev/null

[i.e. "foo 2>&1 1>/dev/null"]

and

foo >/dev/null 2>&1

[i.e. "foo 1>/dev/null 2>&1"]

The first command line will preserve the standard error and discard the
standard output.  The second one will send both of them to /dev/null.
Do you see why?

The last point is the effect of pipelines.

foo SOME-REDIRECTIONS | bar

works largely like

foo >MAGIC-PIPELINE SOME-REDIRECTIONS &
bar <MAGIC-PIPELINE

in that the redirection into the pipe takes place _before_ any other
redirections and can therefore potentially be used or modified by them.
So the shell first does any pipeline redirections and then does the
descriptor assignments or reassignments specified by any ">", "<",
">>", "<<", ">&", or "<&" redirections, in the order in which such
redirections appear on the command line.

> I didn't even know 
> that you could do e.g.
>          program >> file |
> and have the program output attach to the pipe -- I would have thought that 
> once the program output gets redirected to a file, it can't be redirected 
> anywhere else at the same time.

Well, "foo >> bar | baz" means "send the output of foo into the
program baz, but then (instead) send the output of foo into the file
bar".  It has the effect of setting up the pipeline but then not using
it, which is a valid thing to do, although almost always useless.

On the other hand,

foo 2>&1 >> bar | baz

means "send the output of foo into the program baz, then send the
errors from foo to where the output of foo is going [i.e., into baz],
then send the output of foo into bar [instead of into baz]", so it is
a useful command.

foo >> bar 2>&1 | baz

means "send the output of foo into the program baz, then send the
output of foo into the file bar [instead of into baz], then send the
errors from foo to where the output of foo is going [i.e., into the
file bar]", so in this case everything -- stdout and stderr -- goes
into bar, and nothing at all goes into baz.

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu