"
For my ally is the Shell, and a powerful ally it is. Jobs creates it, makes them log. Its commands surround us and guide us. Scripting beings are we! Not those crude mouse clickers! You must use the Shell around you.. here, between mouse and keyboard. Even between... the browser and the forum.."
uhm.. I'm in that mood again
I've messed with shell scripts a lot over the years and I'd even go as far as to state that the shells are the glue which keep our systems together. Alas, even though I've done plenty of scripting I'm also lazy at times. So when I had to process a list like this:
Code:
$ ls | sed -E 's/-[0-9]+\..*\txz//g' | uniq -cd
2 boost-libs
5 ffmpeg
2 glib
3 harfbuzz
3 harfbuzz-icu
... I quickly resorted to
/bin/csh because it could somehow grok the list much better:
Code:
$ for a in `cat list`; do echo $a | cut -w -f2; done
2
boost-libs
5
ffmpeg
2
glib
3
harfbuzz
3
harfbuzz-icu
If you try this out for yourself you'll notice that the
cut command didn't do anything at all. Even using quotes around
$a won't make a difference. The C shell on the other hand...
Code:
% foreach a ("`cat list`")
foreach? echo $a | cut -w -f2
foreach? end
boost-libs
ffmpeg
glib
harfbuzz
harfbuzz-icu
Change
-f2 into
-f1 and you get the amount of occurrences.
Now, I did Google this a few times and even though I did see
IFS getting mentioned several times I never gave it too much thought, also because
csh(1) never mentions it and because the given explanation was often plain out poor.
Well, today I finally dove into
sh(1) and I figured it out
IFS, or the Input Field Separator, determines what characters are to be used for "field splitting":
Code:
IFS Input Field Separators. The default value is <space>,
<tab>, and <newline> in that order. This default also
applies if IFS is unset, but not if it is set to the empty
string.
<CUT>
Embedded newlines before the end of the output are not
removed; however, during field splitting, they may be translated into
spaces depending on the value of IFS and the quoting that is in effect.
<CUT>
Subsequently, a field is delimited by either
1. a non-whitespace character in IFS with any whitespace in IFS
surrounding it, or
2. one or more whitespace characters in IFS.
If a word ends with a non-whitespace character in IFS, there is no empty
field after this character.
This seems all very vague and theoretical, I know, but stay with me for now. Also
very important to know is this:
Code:
Dollar-Single Quotes
Enclosing characters between $' and ' preserves the literal
meaning of all characters except backslashes and single quotes.
A backslash introduces a C-style escape sequence:
\n Newline
It got me thinking... SO:
Code:
#!/bin/sh
IFS=$'\n'
for a in `cat list`; do
echo $a | cut -w -f2;
done
And here is the list again which I used:
Code:
2 boost-libs
5 ffmpeg
2 glib
3 harfbuzz
3 harfbuzz-icu
Try commenting out
IFS and see what happens.
(edit): The key to this mystery is that if you read carefully you'll notice that <space> comes
before end of line. So, uhm, what do you think caused those indents before and after the numbers?
As mentioned: I know that
IFS gets mentioned several times on the Net already, it's also why I got pointed towards it. But most of the given examples never bother to explain why it does what it does, and that is simply not good enough for me.
I don't care about working solutions until I know & understand what makes them tick. And now I do
Hope this could be useful for some of you!