View on GitHub

Commandline Tools Workshop

Course site for PiCSciE/RC bootcamp workshop

Piping

Since the glue of POSIX utilities is their ability to shuffle text around, a basic skill is being able to move text from one stream to file or back again.

Basic redirects

The characters >, >>, and < all allow you redirect a program’s output to a file, or in the case of <, from a file to stdin.

cat file1 > file2 is a really silly way of writing cp file1 file2. More usefully, you can use > to save the output of commands for use later. (This includes command written in the Slurm command wrappers on clusters.)

>> does the same thing, except it lets you append rather than overwrite.

stdout and stderr

In console terms, stdout is 1 and stderr is 2. (stdin is 0.) Any redirect can redirect one or both of the streams. `

A very common use of this is when you want to grab both stdout and stderr and send them to a file using 2>&1 at the end of a command:

python myscript.py > output_log 2>&1

would run your python script, and output both stdout and stderr to the same stream that then gets written to the file output_log.

The logic of redirects is fairly complicated but interprets left to right.

1) Redirect the pointer that stdout points to output_log

2) Redirect the pointer for stderr to the pointer for 1 (output_log)

This pointer logic (as always) causes oddities:

python myscript.py 2>&1 > output_log

You would think this would be functionally equivalent, but it is not. The logic is instead:

1) Redirect the pointer for 2 to be the same as the pointer for 1.

2) Redirect 1 to point to output_log (but NOT 2, since it’s still pointing at 1!)

So you’ve just written a really complicated python myscript.py > output_log

You can also redirect any of the stream to a temporarily placeholder from 3 through 10 and then redirect that.

Bash also includes shortcuts for much of this (i.e., python myscript &> output_log is equivalent to python myscript.py > output_log 2>&1)

See the (manual)[http://www.gnu.org/software/bash/manual/bash.html#Redirections] and also a very illuminating answer (and argument) on (StackOverflow)[https://stackoverflow.com/questions/2342826/how-to-pipe-stderr-and-not-stdout]

Piping |

The pipe | operator takes the stdout of a command (|& for both stdout and stderr) and passes it to stdin. If a program can read from stdin (most can), you can use this to chain input and outputs to an absolutely hilarious degree.

Here’s a oneliner I used recently:

find . -mindepth 1 -maxdepth 1 \
       -type d ! -name media ! -name "log*" ! -name "font*" \
       -printf "%T@\t%Tc\t%p\n" \
       | sort -nr | sed -e "1,5d" \
       | awk -F $'\t' '{print $3}' \
       | xargs rm -rf

This uses the find command to recurse from the current directory . one level (-mindepth, -maxdepth), finding only directories -type d not named media, log or fonts, print their last modified time stamps and name, sort them by most recent, then use sed to cut out the five most recent, awk to strip out just file names, and then finally xargs to reformat those new lines as a list for rm -rf.

This is a bit ridiculous, but it gives you some idea of what you can really do in Bash with the built-in utilities.