BLanklines in AWK programming

kwa · Mar 17, 2009

HI everybody! Hope this is a simple question: in awk programming, if I want to set as Field Separator a blank line (just like this one:
" "
It is not a NEWLINE, it's a "BLANKLINE"), what do I have to use as character/string/expression?

My idea is to split a file into "fields separated by blanklines", so that lines = "everything between one blankline and the next". To do that it might be better to set RS="BLANKLINE" instead of setting FS as a blankline.

If it's difficult to understand, I can put an example afterwards.

Thanks!!

DutchDaemon · Mar 17, 2009

Both FS and RS need/supply a delimiting character (a newline is a character, and it's the default RS), not the absence thereof, so I doubt that what you want is feasible.

anomie · Mar 17, 2009

kwa said:
It is not a NEWLINE, it's a "BLANKLINE"), what do I have to use as character/string/expression?

I don't follow. A so-called "blankline" is just a blank (i.e. hex 20). Right?

Or do you mean a blank + a newline?

fonz · Mar 17, 2009

anomie said:
I don't follow. A so-called "blankline" is just a blank (i.e. hex 20). Right?

Or do you mean a blank + a newline?

Between my own programming work I took some time to fiddle with this a bit. My understanding was that a blank line would be something like "\n[ \t]*\n" (a newline, maybe some whitespace and another newline).

But when writing a file as follows

Code:

line 1
line 2

line 3

line 4
line 5

and using something like cat file.txt|awk '{FS="\n[ \t]*\n"}{print $2}' I only get a bunch of empty lines instead of line 3 as I would expect.

Perhaps someone else can take it from here?

Alphons

vermaden · Mar 17, 2009

fonz said:
But when writing a file as follows

Code:

line 1 line 2 line 3 line 4 line 5

You need to make it like that?

Code:

line 1
line 2
line 3
line 4
line 5

Code:

box$ cat test.txt
line 1
line 2

line 3

line 4
line 5
box$ cat test.txt | grep -v "^[\ \t\n]*$"
line 1
line 2
line 3
line 4
line 5
box$

ephemera · Mar 17, 2009

printf "ln1\n\nln2\nln3\n\nln4" | awk 'BEGIN{RS=""}{print}'

kwa · Mar 18, 2009

First of all thanks to everybody!!It is refreshing and encouraging to receive so many answers for this..

Anomie: may be from the other replies you can see it clearer, what fonz posted is exactly connected to what I am trying to do:

line1
" "
line2

between " " you can see the blankline.

I will try with hex20, "\n[ \t]*\n, and what vermaden and ephemera posted, I'll tell you later.

kwa · Mar 18, 2009

I see what the text editor in this forum does, my "blankline" now looks like a "blanspace" only...

Well it's actually what's between line2 and line3 at fonz's post..

kwa · Mar 18, 2009

Don't have much time now, so I'll explain this better later if it is confusing to understand:

what I want to do is, I think, the same fonz tried to , that is, if I have a file like this:

line1
line2

line3

line4

I want the separator to "go beyond newlines" or "go beyond lines preestablished by the file". The file should be seen like this:

line1line2 #seen as the 1st field but also as a unique ln
line3 #seen as the 2nd field
line4 #seen as the 3rd field

I tries with all characters/strings you suggested and line1 andline2 where not seen as a unique line.But I think the answer is over there somewhere...I'll be back in some hours...

kwa · Mar 18, 2009

HI again! The question was: can I really parse a file with awk "as I want"? I mean, instead of making awk see a file's line as it "should naturally be", can awk see a file with lines delimitated by the field separator I choose (i.e., RS="BLANKLINE")? I am afraid that's not possible..

Results I got from using the separators you suggested: I have the at home

, so I'll show them to you as soon as I arrive, but more or less all attempts, when printing the first field ($1), displayed:

xxxx

if the file's structure was like this:

xxx edrwelrkwej
erÃ¨wrmgerltrmÃ±
srtÃ±erlterte

retertkerltÃ±kerÃ±t
ert`perte`tke
erwerwerwtert3434534'0
928Â¨Ã§Â´Ã§Â´{Ã§

See you soon...

fonz · Mar 19, 2009

kwa said:
HI again! The question was: can I really parse a file with awk "as I want"?

I started wondering about that and the answer may be in the first paragraph of the awk() manual: apparently, awk is line-based no matter how you look at it, so it's probably not the right tool for doing what you want.

However, perhaps the following small script will be of help?

Code:

#!/bin/sh

foo=""
while read line
do
  if [ -z "$line" ]; then
    echo -e "***BEGIN***\n$foo\n***END***" # replace this to do something useful with $foo
    foo=""
  else
    if [ -z "$foo" ]; then
      foo=$line
    else
      foo=`echo -e "$foo\n$line"`
    fi
  fi
done
if [ -n "$foo" ]; then
  echo -e "***BEGIN***\n$foo\n***END***" # and replace this too
fi

Now, assuming you called the script, say, blanklineseperator.sh, taking the same file foo.txt

Code:

line 1
line 2

line 3

line 4
line 5

doing % cat foo.txt|./blanklineseperator.sh results in

Code:

***BEGIN***
line 1
line 2
***END***
***BEGIN***
line 3
***END***
***BEGIN***
line 4
line 5
***END***

which if I'm not mistaken is what you're looking for. Just replace (both instances of!) echo -e "***BEGIN***\n$foo\n***END***" with whatever it is you want to do with the "blocks of lines" and you should be good to go.

It may not be the most elegant solution, but at least it works. Unless if you don't consider a line with only whitespace a blank line. If that's the case, the above still isn't (entirely) correct and you're probably better of with using C, Perl or some other programming language. Doing this in C is easy and for somebody better at Perl than me it should be easy in Perl too - and more elegant to boot. But I'll leave that as an exercise for the Perl experts.

Hope this helps,

Alphons

kwa · Mar 20, 2009

Hey! As you've probably realized, I could mange with what I was trying to do, that's why I vanished

...tell you later how , it's a bit difficult to explain.

Any way thanks fonz, I thought the same as you did at first, but then I "found the light": setting RS ans FS to the same fieldseparator, awk recognizes many lines as one when "parsing", the thing is into that lines the newline character (and also the famous "blankline"..) are also included, but that was allright 'cause I redirected standard output...Well it worked, so thank you for helping!!

Anyway your script is of much help

BLanklines in AWK programming

Administrator