Shell Help with Script - that I do not understand

NugentS · Aug 18, 2021

Hi,
Looking for some help understand a script that was written by someone else. I have edited it, added some stuff but have now run into a brick wall with what the script is doing (more how than what)

The script is looking through the output of smartctl -a /dev/nvme0 which produces the following input to the script

Code:

smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE21D280GA
Serial Number:                      **************************
Firmware Version:                   E2010325
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          280,065,171,456 [280 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Aug 18 15:33:37 2021 BST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    18.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    752,735 [385 GB]                                           ***** These are what I am trying to pick up 
Data Units Written:                 51,418,383 [26.3 TB]                                     ***** but I guess the space is interfering.
Host Read Commands:                 26,493,506
Host Write Commands:                425,473,938
Controller Busy Time:               227
Power Cycles:                       75
Power On Hours:                     10,643
Unsafe Shutdowns:                   41
Media and Data Integrity Errors:    0
Error Information Log Entries:      1

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          1    10       -  0xc00c      -            0     -     -

The section of the script somehow extracts some of this info using awk and then prints it in a format
Script

Code:

for drive in $NVME_list; do
    (
    devid=$(basename "$drive")
    "$smartctl" -a "$drive" | \
    awk -v device=$devid \ '
    /Serial Number:/{serial=$3}
    /Temperature:/{temp=$2} \
    /Available Spare:/{avail_spare=$3} \
    /Percentage Used:/{perc_used=$3} \
    /Data Units Read:/{data_read=$5} \
    /Data Units Read:/{data_read_unit=$6} \
    /Data Units Written:/{data_written=$5} \
    /Data Units Written:/{data_written_unit=$6} \
    END {
    printf "|%-6s|%-24s|%-4s|%-9s%|%-4s|%-11s|%-12s|\n",device,serial,temp,avail_spare,perc_used,data_read,data_written;
    }'
    ) >> "$logfile"
  done
  (
   echo "+------+------------------------+----+---------+----+-----------+------------+"
  ) >> "$logfile"
fi

This sort of works but produces the following output:

|nvme0 |PHM274360038280AGN |38 |100% |0% |[385 |[26.3 |
The first five fields are correct, the last two are not

Which is sort of correct - BUT:

I seem unable to get rid of the leftmost square brackets in the fields with 385 and 26.3
I am also trying to pick up the GB & TB from the Data Units Read & Data Units Written lines and add these to the 385 and 26.3 in the output
I seem unable to manipulate the various data_read and data_written. The _unit variables do hold the correct data (allbeit with a square bracket to the right - but I assume a similar solution to issue 1 would likely apply here
I really don't understand how the awk command is doing what its doing

I suppose I could use grep and sed to mind the info which I would have to write from scratch - but the code above is quite close - I just have no idea how it works.

NugentS · Aug 19, 2021

It occurs to me that I haven't said what I am trying to achieve
data_read continues a numerical value and a [ as such: [385
data_read_unit contains GB]

I need to remove the square brackets from each variable and then combine them to 385 GB - so I might need to add a space in the middle as well

I have tried inserting commands in lines before the printf and after the END { but nothing I have tried works and I just end up with the commands themselves in the variables

Its probably obvious, and I am likely being dumb - but I don't see it

BTW - if it isn't obvious this sort of thing is NOT my strong point

ralphbsz · Aug 19, 2021

You need to learn a little bit about awk. There is a short and very clear book available from O'Reilly; if I remember right, it also covers sed, and has a cute animal on the cover. (Book = pieces of paper, glued together on the left side, think of it as a permanent version of a web page, with page breaks). Read it cover to cover, and make sure you understand the chapter on how awk works (ignore fine details at that point).

Your problems are that awk is not separating things in the way you want them to be separated. Here are my suggestions: When parsing the line that starts with "Data Units", you don't need two separate statements; you can make it easier by having one pattern with two actions:

Code:

/Data Units.../ {
    data_read = $5
    data_read_units = $6
}

Next problem: Extra square brackets. Just remove them by hand. For example awk has a function called gsub(), which replaces things in strings. You can just add that:

Code:

/Data Units.../ {
    data_read = $5
    gsub('[', '', data_read)

Please read either the man page for awk or a book to see how gsub really works; the above is from memory. There is also another solution: Before even going into awk, you could used sed to remove all square brackets. Or you could tell awk that square brackets are separators (in addition to whitespace), and it would then eat them away for you (but perhaps the numbering of fields would change).

Final problem: You want to print the units. Just add them to the print statement:

Code:

END {
    printf "...\n", ..., data_read, data_read_units
}

Hope this is a starting point.

astyle · Aug 19, 2021

This is frankly better than USE (Unix.StackExchange)!

a6h · Aug 19, 2021

* awk tutorial by grymoire.com

The Grymoire's tutorial on AWK

The Grymoire - Tutorial on the AWK program language.

www.grymoire.com

* sed tutorial by grymoire.com

The Grymoire's tutorial on SED

The Grymoire - Tutorial on the SED stream editor.

www.grymoire.com

* sed & awk, 2nd Edition -- it's mentioned earlier by ralphbsz

sed & awk, 2nd Edition

sed & awk describes two text processing programs that are mainstays of the UNIX programmer's toolbox. sed is a "stream editor" for editing streams of text that might be too large to... - Selection from sed & awk, 2nd Edition [Book]

www.oreilly.com

* There's also an electronic version of The AWK programming language by Aho & Kernighan, on archive.org.
I'm not sure about the licence, do your own research. But I know that copyright holder can/will warn the archive.org to remove the copyrighted material, if it is not in public domain.
EDIT: [I've removed the "archive.org" link to the book. The licence condition is not clear.]

ralphbsz · Aug 19, 2021

I've never seen Aho, Weinberger and Kernighan's book in the flesh. I think it has been out of print for decades. Judging by the extremely high prices for used copies (Amazon has one for US-$ 224), I suspect it has become a collectible.

kpedersen · Aug 19, 2021

ralphbsz said:
I've never seen Aho, Weinberger and Kernighan's book in the flesh.

I managed to track down a soft back copy. It is much thinner than the C Programming Language and UNIX Programming Environment books (though they are hard backs). It also isn't white but an off shade of cream. Possibly its age.

Comes with a nifty insert so that I can request more information about an MS-DOS build of Awk. I am still very tempted to fill it out and send it off.

astyle · Aug 19, 2021

This is why I download PDFs, even if it takes looking on torrent sites. Some books are just worth the read.

ralphbsz · Aug 20, 2021

kpedersen said:
It also isn't white but an off shade of cream. Possibly its age.

Wonderful old joke, by the Argentinian music comedian group "Les Luthiers". They are talking about writing a valse in the style of the great composers of the past, and they say: "Los compositores consultaron viejas partituras de la Belle Epoque y descubrieron con sorpresa que la tonalidad era la misma en todas: blanco amarillento". The composers consulted old music scores from the Belle Epoque and were surprised to learn that the tonality was the same in all of them: Yellowing white. The joke works better in Spanish, because there the words for "pitch" or "key" can also mean "color".

kpedersen · Aug 20, 2021

ralphbsz said:
The joke works better in Spanish, because there the words for "pitch" or "key" can also mean "color".

Haha. No, I just about get it (I *think*)

eternal_noob · Aug 20, 2021

ralphbsz said:
Book = pieces of paper, glued together on the left side, think of it as a permanent version of a web page, with page breaks

You forgot to mention that books smell better than web pages.

kpedersen · Aug 20, 2021

eternal_noob said:
You forgot to mention that books smell better than web pages.

Depends how old the book is vs how much Javascript the web page has I imagine!

sko · Aug 20, 2021

ralphbsz said:
I've never seen Aho, Weinberger and Kernighan's book in the flesh. I think it has been out of print for decades. Judging by the extremely high prices for used copies (Amazon has one for US-$ 224), I suspect it has become a collectible.

I found a softcover version in very good condition (with a few interesting pencil notes from the previous owner) for a few bucks several years ago and it is the first thing I look at when I'm stuck with an awk script... I also have the sed&awk book, but this original book by the authors of AWK is IHMO by far _THE_ best and quickest, yet most complete (i.e. it covers 100% of it) introduction to awk.
It very methodically and efficiently tells you everything there is and how it works from the people who actually implemented it - that's it. No esotheric boilerplate about history, huge chapters of examples that only teach you 3 new commands or other time-consuming stuff you usually find in "third party" books.

The only other book I could name from the top of my head which is that efficient and condensed to the mere stuff you really need and want to know, is "The C Programming Language" by Kernighan & Ritchie. Again - written by the actual authors of the language, so they just tell you what is there, how it works and how you use it. Short, precise, eifficient. Absolutely no tome double or triple the size will get you up to speed that quickly and thoroughly. (and FTR: ~1/3 of that already modestly sized book is the command reference!)

Oh, and kpedersen - The AWK book isn't cream coloured, the cover is in highly fashionable "light grey". I'd say it's a tad lighter than the grey keys on the keyboards of that time.

BOT:
As ralphbsz already showed, 'gsub' is the way to go if you need to get rid of some parts of a string. You can condense this into the single line like that:

Code:

/Data Units Read:/{ gsub(/\[/, ""); data_read=$5} \

this replaces every occurance of "[" in $0 from the line that matches "Data Units Read:" with 'nothing' ("") and then sets the variable data_read to $5 (now with the '[' stripped)
(I'm not 100% sure about the escaping of [, but as it is a regexp metacharacter I think it has to be escaped...)
There are some nuances to the syntax of gsub (and others) between the various "modern" implementations of awk. IIRC not all variants support the gsub(a, b) variant, but only gsub(a, b, c), so you might have to specify $0 or e.g. $5

NugentS · Aug 21, 2021

I managed to make some sense of this - but awk defeats me
so I used grep and sed instead. Less tidy but works
I gave up trying to get printf formatting "385 GB" properly and just do 385GB instead. It works for the purpose

The initial gsub suggestion by ralphbsz didn't work - due I think to what sko says above - so I may revisit and tidy my code and retry awk again