Shell In 2025, do you still need to follow the POSIX standard in shell scripts? / What is your shell for scripting?

Based on an O'Rielly book I thought it was only for quick filters alongside sed.
*THE* single best book on AWK until today (and IMHO still the best book on any programming language, closely followed by the K&R) is "The AWK Programming language" by the languages authors Aho, Weinberger, Kernighan.
They manage to tell you everything there is to know about awk in a tad over 200 pages (actually under if you omit the example programs at the end) without bullshitting around like many of those 600+ page tomes nowadays, where you have to wade through endless history lessons or stories about alice, bob and their weird uncle...
It's pure and simple "This is command X, it has options A and B, to achieve this or that use it in the following way. Done, next."
This book still sits near my desk at home within arms reach; although I also got hold of an ebook edition in pdf format several years ago for easy grep'ing, so the dead-tree-edition could be spared a bit...
 
(In addition to sko-s remarks.)
It's always seemed like a kind of expert language to me...
That's is likely a misconception. It may appear so, because, just as in sed, awk might be used, especially on the command line, highly decorated with regular expressions. That is one of its strength's.

Both sed(1) & awk(1) use:
  1. regular expressions
  2. automatic processing on a line-per-line basis
For sed(1) #1 and #2 are its main strengths. Then there is sed's brevity; that translates for most (myself included) to cryptic coding and reading when not practiced often enough . There isn't much more: its limited. It is, after all, a streaming editor.

For awk(1) #1 is just one of its strong points. It can easily, and effectively, be used on the command line. There, awk(1) shows that it is a balanced command-line-scripting language; in my view without contest. awk(1) is far less cryptic than sed(1). Don't forget that (extended) regular expression are cryptic, by nature; in any language!

However, besides #1 and #2, awk(1) has more to offer. If you can forego the use of libraries, such as used by Python, it is a very well balanced programming language, devoid of any excess features. You need more in awk(1)? make use of all the standard UNIX utilities outside, either by means of pre- or post-processing, or by calling them from within awk(1). There are no extensive sets of data structures, but associative arrays are very versatile. Not the fastest, but, in general, if you need more speed, you should be thinking about optmizing the complete scripting pipeline with externally sourced functionalities. Then, it may also be time to consider switching to another programming language alltogteher.

Perhaps #2 feels as a burden, however, one does not need to use its automatic line-by-line processing, also: not all of the time. You can do all your programming without it; use BEGIN {<your program>}. Of course when you make more extended use of awk(1), you probably have left the command line and are using a script file. In my view, Brian Kernighan, as a co-author of the 'C book' (he also co-authored a Go book) is one of the best writers when it comes to a clear and consise exposition of programming in general or a programming language in particular. If you're at all interested what's on offer, I suggest you have a look at bwk's home page and the second edition of the awk book. However, just for starters, I've found Awk by Bruce Barnett, invaluable to have at a 'screens reach'.
 
(In addition to sko-s remarks.)

That's is likely a misconception. It may appear so, because, just as in sed, awk might be used, especially on the command line, highly decorated with regular expressions. That is one of its strength's.

Both sed(1) & awk(1) use:
  1. regular expressions
  2. automatic processing on a line-per-line basis
For sed(1) #1 and #2 are its main strengths. Then there is sed's brevity; that translates for most (myself included) to cryptic coding and reading when not practiced often enough . There isn't much more: its limited. It is, after all, a streaming editor.

For awk(1) #1 is just one of its strong points. It can easily, and effectively, be used on the command line. There, awk(1) shows that it is a balanced command-line-scripting language; in my view without contest. awk(1) is far less cryptic than sed(1). Don't forget that (extended) regular expression are cryptic, by nature; in any language!

However, besides #1 and #2, awk(1) has more to offer. If you can forego the use of libraries, such as used by Python, it is a very well balanced programming language, devoid of any excess features. You need more in awk(1)? make use of all the standard UNIX utilities outside, either by means of pre- or post-processing, or by calling them from within awk(1). There are no extended data structures, but associative arrays are very versatile. Not the fastest, but, in general, if you need more speed, you should be thinking about optmizing the complete scripting pipeline with externally sourced functionalities. Then, it may also be time to consider switching to another programming language alltogteher.

Perhaps #2 feels as a burden, however, one does not need to use its automatic line-by-line processing, also: not all of the time. You can do all your programming without it; use BEGIN {<your program>}. Of course when you make more extended use of awk(1), you probably have left the command line and are using a script file. In my view, Brian Kernighan, as a co-author of the 'C book' (he also co-authored a Go book) is one of the best writers when it comes to a clear and consise exposition of programming in general or a programming language in particular. If you're at all interested what's on offer, I suggest you have a look at bwk's home page and the second edition of the awk book. However, just for starters, I've found Awk by Bruce Barnett, invaluable to have at a 'screens reach'.
I agree, regular expressions are often cryptic and often necessary.

Regarding libraries, I've read comments explaining that they are generally misunderstood and therefore a source of problems due to a lack of mastery of them. I don't mind not using them, especially since some require a learning curve almost as steep as the language itself. Okay, maybe I'm exaggerating a bit there...

By the way, Python, despite its qualities, is an impossible language for me because of the mandatory indentation.
I'll look into that. Thanks for the advice and the links!
 
*THE* single best book on AWK until today (and IMHO still the best book on any programming language, closely followed by the K&R) is "The AWK Programming language" by the languages authors Aho, Weinberger, Kernighan.
They manage to tell you everything there is to know about awk in a tad over 200 pages (actually under if you omit the example programs at the end) without bullshitting around like many of those 600+ page tomes nowadays, where you have to wade through endless history lessons or stories about alice, bob and their weird uncle...
It's pure and simple "This is command X, it has options A and B, to achieve this or that use it in the following way. Done, next."
This book still sits near my desk at home within arms reach; although I also got hold of an ebook edition in pdf format several years ago for easy grep'ing, so the dead-tree-edition could be spared a bit...
I miss the days of the "Precise and Concise" collection (I don't know the original name in English) at O'Reilly, where everything was clearly summarized. What a time saver!
 
Going back to the title, "Do you..." is a big difference against "Should you..."
If one is writing a script that WILL be distributed to other systems, other customers, then yes, one "should"
If one is writing a script for sole personal use, who cares? Write it to support your needs.
 
awk & sed are fine for handling CSV-like data in tabular formats, but it's not for XML, JSON or YAML where you need a more powerful scripting language like Python or Perl.
Well, you could also parse them in awk (I'd strip all the human-hostile formatting and markup first). You *could* even do things like multi-dimensional arrays in awk, but I'm not sure one *should*... (it's fascinating to look at that code though)
But yes, for such horrible formats you're usually better off using perl (or if you must, python), especially because someone else already suffered through the pain of making a library to handle those "languages" sanely.
 
Well, you could also parse them in awk (I'd strip all the human-hostile formatting and markup first). You *could* even do things like multi-dimensional arrays in awk, but I'm not sure one *should*... (it's fascinating to look at that code though)
There are so many things that can and will go wrong with that approach that it's simply not worth it.
 
There are so many things that can and will go wrong with that approach that it's simply not worth it.
True. But "it depends"™
E.g. I had a use-case where I had to only extract a handful of fields from quite large XML files (parts/price data with >500k entries) and could throw away everything else, especially all the XML markup, and then had to mangle that remaining textual data. For this I still went with awk because the input data always had the exact same, predictable format and the major heavy lifting was the data-mangling after extracting the fields - and I had most of that code from mangling other datasets (i.e. the same parts/price data from other suppliers) which were provided in csv.
So I guess there's always enough edge-cases which might justify mangling those formats manually in awk.

Of course, if you find yourself constructing output in large blocks of some horrible markup format in awk, you are very likely on the wrong path...


edit: regarding doing unspeakable things with awk, I just had a horrible flashback: I once had to extract data from files for some proprietary java database that was abandoned by the vendor >10 years prior but some other vendor used anyways until going (finally) out of business... Awk did the job, and it did it fast, but it was an absolute mess getting there...
 
for xml/html xmllint --xpath works best although it kind of sucks with xml namespaces. xmlstarlet works better but xmllint has greater chance to be already installed
 
for xml/html xmllint --xpath works best although it kind of sucks with xml namespaces. xmlstarlet works better but xmllint has greater chance to be already installed
With Junit XML log files, if you have a set of expected failures, you still have to iterate over the testsuites and testcases to transform <failure> to <xfailure> and update the failures counter.

This is not something that can be done with xmlstarlet in O(n).
 
awk & sed are fine for handling CSV-like data in tabular formats, but it's not for XML, JSON or YAML where you need a more powerful scripting language like Python or Perl.

Well, the question is whether we are comparing simple table like data (row and columns). Of course more complicated data structures need a real parser,

If on the other hand you have a 2-dimensional table in XML you can use XSLT to convert it to a line-based format and use awk again.

(don't get me wrong, xslt is masochism, but a simple transformation like that can be straightforward)
 
Back
Top