Shell How to parse a text file?

tpfiler · Aug 16, 2019

I am looking to parse a file for specific content that is on a line that always has a specific beginning,
for example:
"Program: code"
I want to extract code on the line that starts with Program.

I then want to add that extracted string into an already created file.

Can anyone provide some guidance on this?
Thanks!

ralphbsz · Aug 17, 2019

So you don't actually want to parse the whole text file. You want to fish out the one line that begins with the string "Program"; take the rest of the line (or perhaps only the second word?), and append it to a file?

First step: Find the line that begins with "Program:". Here we can use grep. The correct grep pattern begins with a caret (up-arrow), which means look only for lines that have the string at the beginning of the line. Second step: Echo the remainder of the line. The easiest way to do this (I think) is to use awk, and tell it to remove the first word on the line (which will by definition be "Program:") and remove it, then print the rest of the line. Finally, append it to the output file. Here it is. I'm deliberately breaking it into multiple lines to make it more readable; if you want to run it from the command line or make it into a script, you either turn it into a single line, or add a backslash at the end of each line.

Code:

cat input.file |
grep ^Program: | 
awk '{$1=""; print $0}' 
>> output.file

I know this could actually be simplified: awk allows to apply the little expression just to a specific line by using an address up front, but honestly, that would be less readable.

tpfiler · Aug 19, 2019

ralphbsz said:
So you don't actually want to parse the whole text file. You want to fish out the one line that begins with the string "Program"; take the rest of the line (or perhaps only the second word?), and append it to a file?

First step: Find the line that begins with "Program:". Here we can use grep. The correct grep pattern begins with a caret (up-arrow), which means look only for lines that have the string at the beginning of the line. Second step: Echo the remainder of the line. The easiest way to do this (I think) is to use awk, and tell it to remove the first word on the line (which will by definition be "Program:") and remove it, then print the rest of the line. Finally, append it to the output file. Here it is. I'm deliberately breaking it into multiple lines to make it more readable; if you want to run it from the command line or make it into a script, you either turn it into a single line, or add a backslash at the end of each line.

Code:

cat input.file | grep ^Program: | awk '{$1=""; print $0}' >> output.file

I know this could actually be simplified: awk allows to apply the little expression just to a specific line by using an address up front, but honestly, that would be less readable.

This is exactly what I was looking for, thanks!

moridin · Aug 19, 2019

Or use sed like below:

Code:

printf "foo\nProgram: code\nbar\n" | sed -ne 's/^Program: \(.*\)$/\1/p'

tpfiler · Aug 19, 2019

moridin said:
Or use sed like below:

Code:

printf "foo\nProgram: code\nbar\n" | sed -ne 's/^Program: $.*$$/\1/p'

I'll try this will update with results, thanks!

moridin · Aug 20, 2019

Or even simpler (I just think sed is more suited for this task):

Code:

printf "foo\nProgram: code\nbar\n" | sed -ne 's/^Program: //p'

Here we tell sed to not output anything by default, then select all lines starting with pattern 'Program: ', delete the pattern, and print the line.

If you are going to use it on file named e.g. file1 and not on stdin, and append the results to file2, it would like the following:

Code:

sed -ne 's/^Program: //p' file1 >> file2

tpfiler · Aug 26, 2019

moridin said:
Or even simpler (I just think sed is more suited for this task):

Code:

printf "foo\nProgram: code\nbar\n" | sed -ne 's/^Program: //p'

Here we tell sed to not output anything by default, then select all lines starting with pattern 'Program: ', delete the pattern, and print the line.

If you are going to use it on file named e.g. file1 and not on stdin, and append the results to file2, it would like the following:

Code:

sed -ne 's/^Program: //p' file1 >> file2

This is great. Thanks!

tpfiler · Sep 5, 2019

tpfiler said:
This is great. Thanks!

moridin
I was taking to time to learn regular expressions before I started my script and as I was practicing with the sed command I was starting to have some issues:

I am trying to parse a file that has a line, as:

Code:

 - Program code  :  password

And print "password" only to an existing file.

To parse the file I was trying to replicate some of your suggested commands but I cannot seems to even print the searched pattern.
I am using:

Code:

sed '/\s+-\s+Program\scode\s+:\s+?/p' file.txt

Just to verify the standard output of the command and see if I am checking the right pattern.

Is my regular expression off?

Deleted member 30996 · Sep 5, 2019

Not particularly applicable to your needs but more ways to parse a .txt file.
I wanted to find out how many Categories and Responses were in Demonica's Language Center (Her mindfile) so I used wc to see how many lines were in the file:

Code:

$ wc -l /home/jitte/Downloads/demonica-2019-02-13.txt
   45250 /home/jitte/Downloads/demonica-2019-02-13.txt

Then used grep to find out how many of those lines contained the character ID: since that appears in every line that lists a Category:

Code:

$ grep -c ID: /home/jitte/Downloads/demonica-2019-02-13.txt
4505

That gives me the total number of Categories. If I subtract that number from the number of lines in the file that gives me how many Conversational Responses there are total.

Shell How to parse a text file?

Deleted member 30996

Guest