The Command-Line Really Does Matter, Part 1

06:02 reading time

As a computer-something student, one of the things I felt that was under-emphasized in my college education was the importance of the command-line in “real life.” Fire up an IDE with auto-completion, code away, and deploy the source to production—erm, professor—for evaluation by paper. Pray for low concentrations of red ink in the result set. That blasphemous blob of junk you delivered ran at least once (your classmate swears she saw it run, too); whether it runs again or not doesn’t matter so much. What matters more, in this setting, is the student’s interpretation of a narrowly-defined problem and the demonstrated application of theory to solve it.

This method of learning might be considered harmful. We jump to the meat of the “problem” before we get our feet wet in the “meta-basics.”

Why is it that, when I paired with a fellow classmate, we would often have ideas for how to approach a problem, but lacked a sense for where to begin?

Why is it that, more often than not, there is a feeling in software engineering shops that students fresh out of school can’t actually produce “good code”—or, produce anything, for that matter?

I would argue that it’s because more time isn’t spent up front in the world of the command-line, particularly “unixy” variants. A shell has a built-in programming language; some shells can be used to produce pretty sophisticated stuff. Having a solid familiarity with the environment that most likely will be used to “run” whatever it is that you’re producing gives you context for best practices, whether it’s consciously absorbed or not.

For example, the “pipe” mechanism for chaining output to input is the most basic demonstration of how modularity can be used to keep concerns separated in a practical way. Larger workflows can be constructed from smaller building-blocks; the system itself is a role model for “good design.” The instantaneous feedback of a shell’s interpreter provides opportunity for learning, experimentation, and guides towards reusable scripts that will “just run”, and “just run” in the same way, every time. The organization of files actually matters because you’re in the thick of it. There’s nothing mysterious and magical hidden behind-the-scenes that discourages attention to detail. Everything matters.

I’m not an educator and I can’t prove my argument. From personal experience, I know that had I not developed strong familiarity with the Unix command-line environment early on, I would have found course material more challenging and would have felt lost even after all that was over with.

With that, I’d like to take the time to thank the command-line by writing about it. We’ll focus on modern bourne shell variants as every Unix-like system features a bourne shell of some sort; examples here were made on OS X 10.9 with the system’s GNU Bash 3.2.

Evil Characters

Consider the following situation:

~/stuff$ /bin/ls -1

Weird. There’s a question mark in the middle of one of your foos. If you remove that file using rm foo?bar, you’ll also remove fooxbar, because the question mark is a metacharacter in the bourne shell.

Is there anything we can glean from listing our directory in a different way?

~/stuff$ /bin/ls | tee

Fascinating. Now we have two fobar files in the same directory, and our foo?bar has completely disappeared!

This smells like a control character got into our file name.

(Aside: The -1 argument to ls, not to be confused with the -l argument, lists one file per line using ls’s default mechanism for showing information. By piping output to tee, we force evaluation of control characters as-is. For those who fear tee, the same could be accomplished by substituting cat for tee.)

We can prove that it’s a control character using octal dump:

~/stuff$ /bin/ls | od -c
0000000   f   o   b   a   r  \n   f   o   o  \b   b   a   r  \n   f   o
0000020   o   b   a   r  \n   f   o   o   x   b   a   r  \n

Soooo … someone appears to have placed a backspace in the middle of our funny filename. The \n characters indicate new lines, i.e., separating each file in the list. But that \b character, there’s something odd about that.

Let’s use od to show us the hex values of the bytes alongside the logical representations of the characters:

~/stuff$ /bin/ls | od -xc
0000000    6f66    6162    0a72    6f66    086f    6162    0a72    6f66
          f   o   b   a   r  \n   f   o   o  \b   b   a   r  \n   f   o
0000020    626f    7261    660a    6f6f    6278    7261    000a
          o   b   a   r  \n   f   o   o   x   b   a   r  \n

In ASCII and UTF-8, the hexadecimal value for the backspace character is 0x08. Sure enough, there it is, taunting us. Note that the newline character is 0x0a, and the sequences are shown “backwards”—I would guess that this is because I’m using a little-endian microprocessor.

Now that we know what character is stuck in the middle of our naughty filename, we can be very direct in how we remove it. In the bourne shell, you can enter command codes by pressing Control-V, followed by another sequence. Have you ever typed Control-H in your shell, following other characters? It’s a backspace! Combined, we may explicitly remove our file:

~/stuff$ rm foo^Hbar
~/stuff$ /bin/ls
fobar foobar  fooxbar

Success! Note that simply typing a caret and a capital H won’t get you very far.

Check out Part 2 for a study on loops.


Ian Melnick
Senior Software Engineer