BLanklines in AWK programming

kwa

New Member


Messages: 6

HI everybody! Hope this is a simple question: in awk programming, if I want to set as Field Separator a blank line (just like this one:
" "
It is not a NEWLINE, it's a "BLANKLINE"), what do I have to use as character/string/expression?

My idea is to split a file into "fields separated by blanklines", so that lines = "everything between one blankline and the next". To do that it might be better to set RS="BLANKLINE" instead of setting FS as a blankline.

If it's difficult to understand, I can put an example afterwards.

Thanks!!
 

DutchDaemon

Administrator
Staff member
Administrator
Moderator
Developer

Reaction score: 2,811
Messages: 11,288

Both FS and RS need/supply a delimiting character (a newline is a character, and it's the default RS), not the absence thereof, so I doubt that what you want is feasible.
 

anomie

Aspiring Daemon

Reaction score: 120
Messages: 781

kwa said:
It is not a NEWLINE, it's a "BLANKLINE"), what do I have to use as character/string/expression?
I don't follow. A so-called "blankline" is just a blank (i.e. hex 20). Right?

Or do you mean a blank + a newline?
 

fonz

Son of Beastie

Reaction score: 369
Messages: 2,560

anomie said:
I don't follow. A so-called "blankline" is just a blank (i.e. hex 20). Right?

Or do you mean a blank + a newline?
Between my own programming work I took some time to fiddle with this a bit. My understanding was that a blank line would be something like "\n[ \t]*\n" (a newline, maybe some whitespace and another newline).

But when writing a file as follows
Code:
line 1
line 2

line 3

line 4
line 5
and using something like cat file.txt|awk '{FS="\n[ \t]*\n"}{print $2}' I only get a bunch of empty lines instead of line 3 as I would expect.

Perhaps someone else can take it from here?

Alphons
 

vermaden

Son of Beastie

Reaction score: 1,162
Messages: 2,755

fonz said:
But when writing a file as follows
Code:
line 1
line 2

line 3

line 4
line 5
You need to make it like that?
Code:
line 1
line 2
line 3
line 4
line 5
Code:
box$ cat test.txt
line 1
line 2

line 3

line 4
line 5
box$ cat test.txt | grep -v "^[\ \t\n]*$"
line 1
line 2
line 3
line 4
line 5
box$
 
OP
OP
K

kwa

New Member


Messages: 6

First of all thanks to everybody!!It is refreshing and encouraging to receive so many answers for this..:)

Anomie: may be from the other replies you can see it clearer, what fonz posted is exactly connected to what I am trying to do:

line1
" "
line2

between " " you can see the blankline.

I will try with hex20, "\n[ \t]*\n, and what vermaden and ephemera posted, I'll tell you later.
 
OP
OP
K

kwa

New Member


Messages: 6

I see what the text editor in this forum does, my "blankline" now looks like a "blanspace" only...

Well it's actually what's between line2 and line3 at fonz's post..
 
OP
OP
K

kwa

New Member


Messages: 6

Don't have much time now, so I'll explain this better later if it is confusing to understand:

what I want to do is, I think, the same fonz tried to , that is, if I have a file like this:

line1
line2

line3

line4

I want the separator to "go beyond newlines" or "go beyond lines preestablished by the file". The file should be seen like this:

line1line2 #seen as the 1st field but also as a unique ln
line3 #seen as the 2nd field
line4 #seen as the 3rd field

I tries with all characters/strings you suggested and line1 andline2 where not seen as a unique line.But I think the answer is over there somewhere...I'll be back in some hours...
 
OP
OP
K

kwa

New Member


Messages: 6

HI again! The question was: can I really parse a file with awk "as I want"? I mean, instead of making awk see a file's line as it "should naturally be", can awk see a file with lines delimitated by the field separator I choose (i.e., RS="BLANKLINE")? I am afraid that's not possible..

Results I got from using the separators you suggested: I have the at home :), so I'll show them to you as soon as I arrive, but more or less all attempts, when printing the first field ($1), displayed:

xxxx

if the file's structure was like this:

xxx edrwelrkwej
erèwrmgerltrmñ
srtñerlterte

retertkerltñkerñt
ert`perte`tke
erwerwerwtert3434534'0
928¨ç´ç´{ç

See you soon...
 

fonz

Son of Beastie

Reaction score: 369
Messages: 2,560

kwa said:
HI again! The question was: can I really parse a file with awk "as I want"?
I started wondering about that and the answer may be in the first paragraph of the awk() manual: apparently, awk is line-based no matter how you look at it, so it's probably not the right tool for doing what you want.

However, perhaps the following small script will be of help?
Code:
#!/bin/sh

foo=""
while read line
do
  if [ -z "$line" ]; then
    echo -e "***BEGIN***\n$foo\n***END***" # replace this to do something useful with $foo
    foo=""
  else
    if [ -z "$foo" ]; then
      foo=$line
    else
      foo=`echo -e "$foo\n$line"`
    fi
  fi
done
if [ -n "$foo" ]; then
  echo -e "***BEGIN***\n$foo\n***END***" # and replace this too
fi
Now, assuming you called the script, say, blanklineseperator.sh, taking the same file foo.txt
Code:
line 1
line 2

line 3

line 4
line 5
doing % cat foo.txt|./blanklineseperator.sh results in
Code:
***BEGIN***
line 1
line 2
***END***
***BEGIN***
line 3
***END***
***BEGIN***
line 4
line 5
***END***
which if I'm not mistaken is what you're looking for. Just replace (both instances of!) echo -e "***BEGIN***\n$foo\n***END***" with whatever it is you want to do with the "blocks of lines" and you should be good to go.

It may not be the most elegant solution, but at least it works. Unless if you don't consider a line with only whitespace a blank line. If that's the case, the above still isn't (entirely) correct and you're probably better of with using C, Perl or some other programming language. Doing this in C is easy and for somebody better at Perl than me it should be easy in Perl too - and more elegant to boot. But I'll leave that as an exercise for the Perl experts.

Hope this helps,

Alphons
 
OP
OP
K

kwa

New Member


Messages: 6

Hey! As you've probably realized, I could mange with what I was trying to do, that's why I vanished :)...tell you later how , it's a bit difficult to explain.

Any way thanks fonz, I thought the same as you did at first, but then I "found the light": setting RS ans FS to the same fieldseparator, awk recognizes many lines as one when "parsing", the thing is into that lines the newline character (and also the famous "blankline"..) are also included, but that was allright 'cause I redirected standard output...Well it worked, so thank you for helping!!

Anyway your script is of much help
 
Top