Shell wc padding

Apologies in advance for the noise. I don't know why this seems bother me so much lately, it's just *one of those things*. I am by no means an expert programmer, just someone that has used the wc command for the better part of 20+ years on various unix/unix-like operating systems. It's one of those utilities that have been in unix almost from the beginning. Can someone please explain what was the reasoning behind the space padding on wc results?

We've all done it one way or another, wc -l {file} | cut/sed/awk. Heck, we can even utilize libxo output (though _still_ retaining space padding).

Have I been using wc wrong all these years? If not, is it worth submitting a patch so that passing, say "-P", suppresses padding?
 
A: Look up the POSIX standard for wc. See whether it specifies the situation with space padding.

B: Pull up the source code from a repository, and look for changes in the wc utility (use the history function of the source control system of your choice). When was this introduced?
 
As far as I know, space padding has been there from the beginning (regardless of unix flavor). Like I said before, it's just one of those things that make you go hmmm.

For grins, I looked at the code on 11.3-p1 for wc and see the reason why it's padded, just curious as to the design choice to do so. Changing output lines from "%7ju" to "%-7ju" is the right direction and a simple enough fix, but there still seems to be a leading space being outputted.
 
Ha, so we have a liar right here, wc(1), saying the following:
STANDARDS
The wc utility conforms to IEEE Std 1003.1-2001 (“POSIX.1”).
...while the actual format specified by POSIX is:
STDOUT

By default, the standard output shall contain an entry for each input file of the form:
"%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>
...and what we have currently seems to be that of "System V version of wc ".
 
Strange. Just tested it on one Linux machine. If you just run wc (without any of the -l, -w or -c switches), it prints each of the three numbers in a 6-digit field, with a blank in between. But if you use the -l, -w or -c switches, it outputs only the digits, with no space padding.

The POSIX specification that moridin quoted is pretty unfriendly; it would imply that the output is not aligned in nice columns. If someone has access to a really old BSD system (of the 4.3 or 4.4 era) or a similarly old SysV system, it would be fun to see what they do.

By the way, in the grand scheme of things: We're splitting hairs here, lengthwise!
 
Strange. Just tested it on one Linux machine. If you just run wc (without any of the -l, -w or -c switches), it prints each of the three numbers in a 6-digit field, with a blank in between. But if you use the -l, -w or -c switches, it outputs only the digits, with no space padding.

The POSIX specification that moridin quoted is pretty unfriendly; it would imply that the output is not aligned in nice columns. If someone has access to a really old BSD system (of the 4.3 or 4.4 era) or a similarly old SysV system, it would be fun to see what they do.

By the way, in the grand scheme of things: We're splitting hairs here, lengthwise!

Yes, the purpose of the post was not an attempt "to change the world", just one of those habits that we all just accepted as "the way it is". How wc responded for you in Linux was exactly what I think *should* be correct, piped input with single output flag results in a zero padded number otherwise, result in typical columned format.

I still have a SCO UnixWare 7 package on my shelf collecting dust, maybe I'll fire it up on a VM and see how wc responds.
 
What is the actual problem with the space padding? I mean, why is it bad? It never bothered me.
 
What is the actual problem with the space padding? I mean, why is it bad? It never bothered me.

If used in a shell script, you have to cut the spaces to assign as a numeric variable, like:
Code:
#!/bin/tcsh
@ COUNT = `grep "relay=" /var/log/maillog | wc -l | awk '{print $1}'`   ## or sed/cut ##
@ ONEPLUSCOUNT = $COUNT + 1
echo $ONEPLUSCOUNT

Just seems (to me) that having to parse out spaces seems like an unnecessary step. I realize that wc is used occasionally standalone, but I'd think that most use cases are utilized in some form like above.
 
Well, what do you know...
UnixWare7_wc.png


It's been a long time (1997ish?), but I would have swore it was space padded.
 
I realize that wc is used occasionally standalone, but I'd think that most use cases are utilized in some form like above.
What it really comes down to is that wc is used in two radically different contexts. First, directly by human, where it need to have output that is easy for humans to see and understand:
Code:
> wc *.txt
     1        2          3 short.txt
   456    34567     234567 medium.txt
987654 98765432 9876543210 oh_my_god.txt

Second, as a part of a particularly perverse and broken programming language, namely shell scripts, which uses bizarre syntax and semantics:
Code:
av_lines=`expr \( `wc -l short.txt` + `wc -l long.txt` \) / 2`
That garbage is nearly unreadable, hard to debugh, and relies on arcane rules: how does expr parse its arguments? What is the difference between strings and numbers? When are multiple blanks removed and when aren't they? How does quoting work? An expert can answer all these questions (and I can be that expert if necessary), but it should not require an expert to write a simple expression that just averages two numbers. And this is why people should not use shell scripts for programs. Shells should be used for interactively running programs from the command line; programs should be written in first-class programming languages. Take the above example and rewrite it in a programming language of your choice (I would use python, but tastes differ), and suddenly it's easy and clear.
 
If used in a shell script, you have to cut the spaces to assign as a numeric variable, like:
Code:
#!/bin/tcsh
@ COUNT = `grep "relay=" /var/log/maillog | wc -l | awk '{print $1}'`   ## or sed/cut ##
@ ONEPLUSCOUNT = $COUNT + 1
echo $ONEPLUSCOUNT
I'm sorry, but I can't reproduce that problem, neither with the standard shell /bin/sh nor with tcsh (which should not be used for scripting anyway).
Code:
/bin/tcsh
> @ COUNT = `wc -l < /etc/rc.conf`
> echo $COUNT
130
> @ ONEPLUSCOUNT = $COUNT + 1
> echo $ONEPLUSCOUNT
131

/bin/sh
$ COUNT=`wc -l < /etc/rc.conf`
$ echo $COUNT
130
$ ONEPLUSCOUNT=$(( COUNT + 1 ))
$ echo $ONEPLUSCOUNT
131
This is FreeBSD amd/64 stable/12 r349942 (2019-07-12).
 
Back
Top