This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
From: Seth David Schoen <schoen@loyalty.org> To: svlug@svlug.org Subject: [svlug] [schoen@loyalty.org: Re: redirecting STDERR to two places, within a script] While I'm writing long messages, I thought I'd share this one, an attempt at explaining Bourne shell redirection in its full generality. (This might be done better in the original Bourne shell man page or in a book like _Learning the bash Shell_; I haven't looked at either of those recently.) ----- Forwarded message from Seth David Schoen <schoen@loyalty.org> ----- [My friend wanted to run "someprog" and to save its standard output and standard error, but in such a way that he could tell which was which, and also tell in which order the messages had appeared, relative to one another. He decided that his needs would be met if he could get standard output and standard error both redirected to a file called "both-log", while _also_ redirecting (just) standard error to another file called "sterr-only-log". My solution for this was someprog 2>&1 >> both-log | tee -a both-log >> stderr-only-log and he reported that this worked perfectly, but then asked about how it worked. I assumed that he and any other reader of this message understand how "tee" works, but maybe are confused about the "2>&1 >> both-log | tee" part.] > I'm still confused by the syntax, but I guess I'll figure it out -- e.g. in > the line: > > >someprog 2>&1 >> both-log | tee -a both-log >> stderr-only-log > > > I would have wondered: (1) since both-log is getting written to in two > places, why doesn't it contain duplicates of all lines that someprog > printed to stdout, and (2) since, by the time "tee" is invoked, stderr and > stdout have been bundled into one stream, how does it separate them > again? Both of these are based on misconceptions. (1) both-log is indeed getting written to in two places (by two processes), but the first is writing only someprog's stdout and the second is writing only someprog's stderr. (2) stdout and stderr were never bundled into one stream; stdout went directly into the file "both-log", and only stderr went into the pipe. > But I'm probably just reading the line wrong. Yep. "2>&1 >> both-log" is very different from ">> both-log 2>&1", which is probably closer to what you think I'm doing. The order of redirections in the shell is very significant, because of how redirections are implemented. First of all, from the point of view of a process, there are _just_ a particular set of file descriptors associated with that process. These descriptors might be closed (well, technically, a file descriptor that isn't open doesn't exist, or is not valid) or might be attached to something; the things that a file descriptor is attached to are "file-like" things, which could be a file on disk, a "special" file (a device, including a terminal device), a pipe (sort of a way of coupling one descriptor with a different descriptor belonging to a different process, so that what's read by one is read by the other), a named pipe (like a pipe but it's created in a different way, and so _appears_ to be a file in the filesystem, except that things "stored" in it don't persist on disk but are just used to communicate with another program), a socket (like a pipe but it's bidirectional and it's often implemented with a network connection), or possibly other things. Any file descriptor _could_ be attached to any file-like thing, either by a program that was written to open that thing, or by a shell doing "standard I/O" setup before running a child process. The standard I/O mechanism is (as I think I said another time) a completely non-binding but essentially universal convention, where, when a process is started, if it inherits a file descriptor "0", that descriptor is supposed to be used by default for any console input operations, if it inherits a file descriptor "1", that descriptor is supposed to be used by default for any console output operations, and if it inherits a file descriptor "2", that descriptor is supposed to be used by default for any console error reporting operations. This is the extent of the stdio descriptor convention, and if you know this much and a few Unix system calls, you can write useful console utilities even without the standard C library. (I wrote a program called "md5tee" which, in an early version, didn't use the C library at all, for speed reasons.) OK, so shell redirection is a way of setting up a set of descriptors for a program. It is _usually_ assumed that the program will follow the stdio convention (stdin, stdout, stderr), and shell redirection notation certainly has a built-in bias in favor of this convention, although you can actually do odd things like close a program's stdin before it starts (the redirection "<&-" in the Bourne shell) or give a program an input descriptor 3 which is reading the password file ("3 </etc/passwd" in the Bourne shell). If the program is expecting it, there are perfectly legitimate applications for these weird uses of redirection. The typical use of redirection is to "fool" a program into reading, writing, or both, something other than your terminal (or something other than whatever is your shell's own default location for input or output -- so if you write a shell script and then you run it and redirect output to "foo", but the script itself contains commands to redirect some output to "bar", that output still goes to "bar"). Now the general form of a redirection is either "program n>b" or "program n<b" (Note that ">b" always means "1>b", and "<b" always means "0<b". Similarly, ">>b" means "1>>b", "<<b" means "0<<b", ">&n" means "1>&n", "<&n" means "0<&n". So when you say "ls >foo", you are saying "ls 1>foo".) Here n is the file descriptor and b is the name of a file. So this means "attach the file descriptor number n that program will get to the file b", and the ">" means "opened for writing", the "<" means "opened for reading", and then ">>" means "opened for writing and positioned to the end of the file (to append)". There is a different form "program n>&m" or "program n<&m" which has a subtle meaning: "attach the file descriptor n that program will get to _whatever thing the file descriptor m is currently attached to_". It does not mean (as you might think) "attach descriptor n to descriptor m" or "make descriptor n a copy of m" or "merge descriptor n into descriptor m". They are still both completely independent file descriptors; it's just a shorthand that says "attach n to what m is attached to at this moment" -- "send n where m is currently going" or "take n from wherever m is currently being taken from". This distinction is important because it means that, if you redirect "a>&b" and then redirect "b>c", descriptor a is going _where descriptor b was originally going_, and descriptor b is now going to file c. There is no way to merge descriptors, but if you do "a>&b" or "b>&a" _and then don't redirect either descriptor to some other place_, descriptors a and b will start off going to the same place. That does not mean that they are the same descriptor, just that both descriptors are attached to the same file. One of the most common redirections is foo 2>&1 which means "send stderr to where stdout is going" -- you can _think_ of this as merging the streams, but it doesn't conceptually merge them, it just sends them to the same place, which has the same practical effect as merging them would, but in the implementation they are still quite separate. (Two hoses emptying into the same bucket? They're still two separate hoses, and theoretically you _could_ still take one of the hoses back out of the bucket and point it somewhere else.) Another is foo 2>/dev/null which means "send stderr to /dev/null". OK, so it's important to understand the effect of the order of these redirections: foo 2>&1 >/dev/null [i.e. "foo 2>&1 1>/dev/null"] and foo >/dev/null 2>&1 [i.e. "foo 1>/dev/null 2>&1"] The first command line will preserve the standard error and discard the standard output. The second one will send both of them to /dev/null. Do you see why? The last point is the effect of pipelines. foo SOME-REDIRECTIONS | bar works largely like foo >MAGIC-PIPELINE SOME-REDIRECTIONS & bar <MAGIC-PIPELINE in that the redirection into the pipe takes place _before_ any other redirections and can therefore potentially be used or modified by them. So the shell first does any pipeline redirections and then does the descriptor assignments or reassignments specified by any ">", "<", ">>", "<<", ">&", or "<&" redirections, in the order in which such redirections appear on the command line. > I didn't even know > that you could do e.g. > program >> file | > and have the program output attach to the pipe -- I would have thought that > once the program output gets redirected to a file, it can't be redirected > anywhere else at the same time. Well, "foo >> bar | baz" means "send the output of foo into the program baz, but then (instead) send the output of foo into the file bar". It has the effect of setting up the pipeline but then not using it, which is a valid thing to do, although almost always useless. On the other hand, foo 2>&1 >> bar | baz means "send the output of foo into the program baz, then send the errors from foo to where the output of foo is going [i.e., into baz], then send the output of foo into bar [instead of into baz]", so it is a useful command. foo >> bar 2>&1 | baz means "send the output of foo into the program baz, then send the output of foo into the file bar [instead of into baz], then send the errors from foo to where the output of foo is going [i.e., into the file bar]", so in this case everything -- stdout and stderr -- goes into bar, and nothing at all goes into baz. ===