Shell In 2025, do you still need to follow the POSIX standard in shell scripts? / What is your shell for scripting?

sko · Sep 19, 2025

kpedersen said:
Based on an O'Rielly book I thought it was only for quick filters alongside sed.

*THE* single best book on AWK until today (and IMHO still the best book on any programming language, closely followed by the K&R) is "The AWK Programming language" by the languages authors Aho, Weinberger, Kernighan.
They manage to tell you everything there is to know about awk in a tad over 200 pages (actually under if you omit the example programs at the end) without bullshitting around like many of those 600+ page tomes nowadays, where you have to wade through endless history lessons or stories about alice, bob and their weird uncle...
It's pure and simple "This is command X, it has options A and B, to achieve this or that use it in the following way. Done, next."
This book still sits near my desk at home within arms reach; although I also got hold of an ebook edition in pdf format several years ago for easy grep'ing, so the dead-tree-edition could be spared a bit...

Erichans · Sep 19, 2025

(In addition to sko-s remarks.)

Yog-Sothoth said:
It's always seemed like a kind of expert language to me...

That's is likely a misconception. It may appear so, because, just as in sed, awk might be used, especially on the command line, highly decorated with regular expressions. That is one of its strength's.

Both sed(1) & awk(1) use:

regular expressions
automatic processing on a line-per-line basis

For sed(1) #1 and #2 are its main strengths. Then there is sed's brevity; that translates for most (myself included) to cryptic coding and reading when not practiced often enough . There isn't much more: its limited. It is, after all, a streaming editor.

For awk(1) #1 is just one of its strong points. It can easily, and effectively, be used on the command line. There, awk(1) shows that it is a balanced command-line-scripting language; in my view without contest. awk(1) is far less cryptic than sed(1). Don't forget that (extended) regular expression are cryptic, by nature; in any language!

However, besides #1 and #2, awk(1) has more to offer. If you can forego the use of libraries, such as used by Python, it is a very well balanced programming language, devoid of any excess features. You need more in awk(1)? make use of all the standard UNIX utilities outside, either by means of pre- or post-processing, or by calling them from within awk(1). There are no extensive sets of data structures, but associative arrays are very versatile. Not the fastest, but, in general, if you need more speed, you should be thinking about optmizing the complete scripting pipeline with externally sourced functionalities. Then, it may also be time to consider switching to another programming language alltogteher.

Perhaps #2 feels as a burden, however, one does not need to use its automatic line-by-line processing, also: not all of the time. You can do all your programming without it; use BEGIN {<your program>}. Of course when you make more extended use of awk(1), you probably have left the command line and are using a script file. In my view, Brian Kernighan, as a co-author of the 'C book' (he also co-authored a Go book) is one of the best writers when it comes to a clear and consise exposition of programming in general or a programming language in particular. If you're at all interested what's on offer, I suggest you have a look at bwk's home page and the second edition of the awk book. However, just for starters, I've found Awk by Bruce Barnett, invaluable to have at a 'screens reach'.

Yog-Sothoth · Sep 19, 2025

Erichans said:
(In addition to sko-s remarks.)

That's is likely a misconception. It may appear so, because, just as in sed, awk might be used, especially on the command line, highly decorated with regular expressions. That is one of its strength's.

Both sed(1) & awk(1) use:

regular expressions

automatic processing on a line-per-line basis

For sed(1) #1 and #2 are its main strengths. Then there is sed's brevity; that translates for most (myself included) to cryptic coding and reading when not practiced often enough . There isn't much more: its limited. It is, after all, a streaming editor.

For awk(1) #1 is just one of its strong points. It can easily, and effectively, be used on the command line. There, awk(1) shows that it is a balanced command-line-scripting language; in my view without contest. awk(1) is far less cryptic than sed(1). Don't forget that (extended) regular expression are cryptic, by nature; in any language!

However, besides #1 and #2, awk(1) has more to offer. If you can forego the use of libraries, such as used by Python, it is a very well balanced programming language, devoid of any excess features. You need more in awk(1)? make use of all the standard UNIX utilities outside, either by means of pre- or post-processing, or by calling them from within awk(1). There are no extended data structures, but associative arrays are very versatile. Not the fastest, but, in general, if you need more speed, you should be thinking about optmizing the complete scripting pipeline with externally sourced functionalities. Then, it may also be time to consider switching to another programming language alltogteher.

Perhaps #2 feels as a burden, however, one does not need to use its automatic line-by-line processing, also: not all of the time. You can do all your programming without it; use BEGIN {<your program>}. Of course when you make more extended use of awk(1), you probably have left the command line and are using a script file. In my view, Brian Kernighan, as a co-author of the 'C book' (he also co-authored a Go book) is one of the best writers when it comes to a clear and consise exposition of programming in general or a programming language in particular. If you're at all interested what's on offer, I suggest you have a look at bwk's home page and the second edition of the awk book. However, just for starters, I've found Awk by Bruce Barnett, invaluable to have at a 'screens reach'.

I agree, regular expressions are often cryptic and often necessary.

Regarding libraries, I've read comments explaining that they are generally misunderstood and therefore a source of problems due to a lack of mastery of them. I don't mind not using them, especially since some require a learning curve almost as steep as the language itself. Okay, maybe I'm exaggerating a bit there...

By the way, Python, despite its qualities, is an impossible language for me because of the mandatory indentation.
I'll look into that. Thanks for the advice and the links!

Yog-Sothoth · Sep 19, 2025

sko said:
*THE* single best book on AWK until today (and IMHO still the best book on any programming language, closely followed by the K&R) is "The AWK Programming language" by the languages authors Aho, Weinberger, Kernighan.
They manage to tell you everything there is to know about awk in a tad over 200 pages (actually under if you omit the example programs at the end) without bullshitting around like many of those 600+ page tomes nowadays, where you have to wade through endless history lessons or stories about alice, bob and their weird uncle...
It's pure and simple "This is command X, it has options A and B, to achieve this or that use it in the following way. Done, next."
This book still sits near my desk at home within arms reach; although I also got hold of an ebook edition in pdf format several years ago for easy grep'ing, so the dead-tree-edition could be spared a bit...

I miss the days of the "Precise and Concise" collection (I don't know the original name in English) at O'Reilly, where everything was clearly summarized. What a time saver!

rbranco · Sep 19, 2025

awk & sed are fine for handling CSV-like data in tabular formats, but it's not for XML, JSON or YAML where you need a more powerful scripting language like Python or Perl.

mer · Sep 19, 2025

Going back to the title, "Do you..." is a big difference against "Should you..."
If one is writing a script that WILL be distributed to other systems, other customers, then yes, one "should"
If one is writing a script for sole personal use, who cares? Write it to support your needs.

vmisev · Sep 19, 2025

Yog-Sothoth said:
And yet, if I remember correctly, it was RMS who proposed the name POSIX...

Yup, you do!
The origin of the name POSIX

yjqg6666 · Sep 22, 2025

rbranco said:
JSON or YAML where you need a more powerful scripting language like Python or Perl.

I prefer to use jq for json processing, yq for yaml processing. They are really powerful.

sko · Sep 22, 2025

rbranco said:
awk & sed are fine for handling CSV-like data in tabular formats, but it's not for XML, JSON or YAML where you need a more powerful scripting language like Python or Perl.

Well, you could also parse them in awk (I'd strip all the human-hostile formatting and markup first). You *could* even do things like multi-dimensional arrays in awk, but I'm not sure one *should*... (it's fascinating to look at that code though)
But yes, for such horrible formats you're usually better off using perl (or if you must, python), especially because someone else already suffered through the pain of making a library to handle those "languages" sanely.

rbranco · Sep 22, 2025

sko said:
Well, you could also parse them in awk (I'd strip all the human-hostile formatting and markup first). You *could* even do things like multi-dimensional arrays in awk, but I'm not sure one *should*... (it's fascinating to look at that code though)

There are so many things that can and will go wrong with that approach that it's simply not worth it.

sko · Sep 22, 2025

rbranco said:
There are so many things that can and will go wrong with that approach that it's simply not worth it.

True. But "it depends"™
E.g. I had a use-case where I had to only extract a handful of fields from quite large XML files (parts/price data with >500k entries) and could throw away everything else, especially all the XML markup, and then had to mangle that remaining textual data. For this I still went with awk because the input data always had the exact same, predictable format and the major heavy lifting was the data-mangling after extracting the fields - and I had most of that code from mangling other datasets (i.e. the same parts/price data from other suppliers) which were provided in csv.
So I guess there's always enough edge-cases which might justify mangling those formats manually in awk.

Of course, if you find yourself constructing output in large blocks of some horrible markup format in awk, you are very likely on the wrong path...

edit: regarding doing unspeakable things with awk, I just had a horrible flashback: I once had to extract data from files for some proprietary java database that was abandoned by the vendor >10 years prior but some other vendor used anyways until going (finally) out of business... Awk did the job, and it did it fast, but it was an absolute mess getting there...

covacat · Sep 22, 2025

for xml/html xmllint --xpath works best although it kind of sucks with xml namespaces. xmlstarlet works better but xmllint has greater chance to be already installed

rbranco · Sep 22, 2025

covacat said:
for xml/html xmllint --xpath works best although it kind of sucks with xml namespaces. xmlstarlet works better but xmllint has greater chance to be already installed

With Junit XML log files, if you have a set of expected failures, you still have to iterate over the testsuites and testcases to transform <failure> to <xfailure> and update the failures counter.

This is not something that can be done with xmlstarlet in O(n).

cracauer@ · Sep 22, 2025

rbranco said:
awk & sed are fine for handling CSV-like data in tabular formats, but it's not for XML, JSON or YAML where you need a more powerful scripting language like Python or Perl.

Well, the question is whether we are comparing simple table like data (row and columns). Of course more complicated data structures need a real parser,

If on the other hand you have a 2-dimensional table in XML you can use XSLT to convert it to a line-based format and use awk again.

(don't get me wrong, xslt is masochism, but a simple transformation like that can be straightforward)

cracauer@ · Sep 22, 2025

yjqg6666 said:
I prefer to use jq for json processing, yq for yaml processing. They are really powerful.

Thanks for mentioning those. Look nifty.