Shell How to parse a text file?

tpfiler

Member

Reaction score: 4
Messages: 27

I am looking to parse a file for specific content that is on a line that always has a specific beginning,
for example:
"Program: code"
I want to extract code on the line that starts with Program.

I then want to add that extracted string into an already created file.

Can anyone provide some guidance on this?
Thanks!
 

ralphbsz

Son of Beastie

Reaction score: 2,017
Messages: 2,984

So you don't actually want to parse the whole text file. You want to fish out the one line that begins with the string "Program"; take the rest of the line (or perhaps only the second word?), and append it to a file?

First step: Find the line that begins with "Program:". Here we can use grep. The correct grep pattern begins with a caret (up-arrow), which means look only for lines that have the string at the beginning of the line. Second step: Echo the remainder of the line. The easiest way to do this (I think) is to use awk, and tell it to remove the first word on the line (which will by definition be "Program:") and remove it, then print the rest of the line. Finally, append it to the output file. Here it is. I'm deliberately breaking it into multiple lines to make it more readable; if you want to run it from the command line or make it into a script, you either turn it into a single line, or add a backslash at the end of each line.
Code:
cat input.file |
grep ^Program: | 
awk '{$1=""; print $0}' 
>> output.file
I know this could actually be simplified: awk allows to apply the little expression just to a specific line by using an address up front, but honestly, that would be less readable.
 
OP
T

tpfiler

Member

Reaction score: 4
Messages: 27

So you don't actually want to parse the whole text file. You want to fish out the one line that begins with the string "Program"; take the rest of the line (or perhaps only the second word?), and append it to a file?

First step: Find the line that begins with "Program:". Here we can use grep. The correct grep pattern begins with a caret (up-arrow), which means look only for lines that have the string at the beginning of the line. Second step: Echo the remainder of the line. The easiest way to do this (I think) is to use awk, and tell it to remove the first word on the line (which will by definition be "Program:") and remove it, then print the rest of the line. Finally, append it to the output file. Here it is. I'm deliberately breaking it into multiple lines to make it more readable; if you want to run it from the command line or make it into a script, you either turn it into a single line, or add a backslash at the end of each line.
Code:
cat input.file |
grep ^Program: |
awk '{$1=""; print $0}'
>> output.file
I know this could actually be simplified: awk allows to apply the little expression just to a specific line by using an address up front, but honestly, that would be less readable.
This is exactly what I was looking for, thanks!
 

moridin

Member

Reaction score: 27
Messages: 34

Or use sed like below:
Code:
printf "foo\nProgram: code\nbar\n" | sed -ne 's/^Program: \(.*\)$/\1/p'
 

moridin

Member

Reaction score: 27
Messages: 34

Or even simpler (I just think sed is more suited for this task):
Code:
printf "foo\nProgram: code\nbar\n" | sed -ne 's/^Program: //p'

Here we tell sed to not output anything by default, then select all lines starting with pattern 'Program: ', delete the pattern, and print the line.

If you are going to use it on file named e.g. file1 and not on stdin, and append the results to file2, it would like the following:
Code:
sed -ne 's/^Program: //p' file1 >> file2
 
OP
T

tpfiler

Member

Reaction score: 4
Messages: 27

Or even simpler (I just think sed is more suited for this task):
Code:
printf "foo\nProgram: code\nbar\n" | sed -ne 's/^Program: //p'

Here we tell sed to not output anything by default, then select all lines starting with pattern 'Program: ', delete the pattern, and print the line.

If you are going to use it on file named e.g. file1 and not on stdin, and append the results to file2, it would like the following:
Code:
sed -ne 's/^Program: //p' file1 >> file2
This is great. Thanks!
 
OP
T

tpfiler

Member

Reaction score: 4
Messages: 27

This is great. Thanks!
moridin
I was taking to time to learn regular expressions before I started my script and as I was practicing with the sed command I was starting to have some issues:

I am trying to parse a file that has a line, as:
Code:
 - Program code  :  password
And print "password" only to an existing file.

To parse the file I was trying to replicate some of your suggested commands but I cannot seems to even print the searched pattern.
I am using:
Code:
sed '/\s+-\s+Program\scode\s+:\s+?/p' file.txt

Just to verify the standard output of the command and see if I am checking the right pattern.

Is my regular expression off?
 

Trihexagonal

Daemon

Reaction score: 1,691
Messages: 2,261

Not particularly applicable to your needs but more ways to parse a .txt file.
I wanted to find out how many Categories and Responses were in Demonica's Language Center (Her mindfile) so I used wc to see how many lines were in the file:

Code:
$ wc -l /home/jitte/Downloads/demonica-2019-02-13.txt
   45250 /home/jitte/Downloads/demonica-2019-02-13.txt

Then used grep to find out how many of those lines contained the character ID: since that appears in every line that lists a Category:

Code:
$ grep -c ID: /home/jitte/Downloads/demonica-2019-02-13.txt
4505

That gives me the total number of Categories. If I subtract that number from the number of lines in the file that gives me how many Conversational Responses there are total.
 
Top