PDA

View Full Version : BLanklines in AWK programming


kwa
March 17th, 2009, 17:29
HI everybody! Hope this is a simple question: in awk programming, if I want to set as Field Separator a blank line (just like this one:
" "
It is not a NEWLINE, it's a "BLANKLINE"), what do I have to use as character/string/expression?

My idea is to split a file into "fields separated by blanklines", so that lines = "everything between one blankline and the next". To do that it might be better to set RS="BLANKLINE" instead of setting FS as a blankline.

If it's difficult to understand, I can put an example afterwards.

Thanks!!

DutchDaemon
March 17th, 2009, 18:04
Both FS and RS need/supply a delimiting character (a newline is a character, and it's the default RS), not the absence thereof, so I doubt that what you want is feasible.

anomie
March 17th, 2009, 18:51
It is not a NEWLINE, it's a "BLANKLINE"), what do I have to use as character/string/expression?

I don't follow. A so-called "blankline" is just a blank (i.e. hex 20). Right?

Or do you mean a blank + a newline?

fonz
March 17th, 2009, 19:52
I don't follow. A so-called "blankline" is just a blank (i.e. hex 20). Right?

Or do you mean a blank + a newline?
Between my own programming work I took some time to fiddle with this a bit. My understanding was that a blank line would be something like "\n[ \t]*\n" (a newline, maybe some whitespace and another newline).

But when writing a file as followsline 1
line 2

line 3

line 4
line 5

and using something like cat file.txt|awk '{FS="\n[ \t]*\n"}{print $2}' I only get a bunch of empty lines instead of line 3 as I would expect.

Perhaps someone else can take it from here?

Alphons

vermaden
March 17th, 2009, 20:17
But when writing a file as followsline 1
line 2

line 3

line 4
line 5

You need to make it like that?
line 1
line 2
line 3
line 4
line 5

box$ cat test.txt
line 1
line 2

line 3

line 4
line 5
box$ cat test.txt | grep -v "^[\ \t\n]*$"
line 1
line 2
line 3
line 4
line 5
box$

ephemera
March 17th, 2009, 22:06
printf "ln1\n\nln2\nln3\n\nln4" | awk 'BEGIN{RS=""}{print}'

kwa
March 18th, 2009, 16:43
First of all thanks to everybody!!It is refreshing and encouraging to receive so many answers for this..:)

Anomie: may be from the other replies you can see it clearer, what fonz posted is exactly connected to what I am trying to do:

line1
" "
line2

between " " you can see the blankline.

I will try with hex20, "\n[ \t]*\n, and what vermaden and ephemera posted, I'll tell you later.

kwa
March 18th, 2009, 16:48
I see what the text editor in this forum does, my "blankline" now looks like a "blanspace" only...

Well it's actually what's between line2 and line3 at fonz's post..

kwa
March 18th, 2009, 17:33
Don't have much time now, so I'll explain this better later if it is confusing to understand:

what I want to do is, I think, the same fonz tried to , that is, if I have a file like this:

line1
line2

line3

line4

I want the separator to "go beyond newlines" or "go beyond lines preestablished by the file". The file should be seen like this:

line1line2 #seen as the 1st field but also as a unique ln
line3 #seen as the 2nd field
line4 #seen as the 3rd field

I tries with all characters/strings you suggested and line1 andline2 where not seen as a unique line.But I think the answer is over there somewhere...I'll be back in some hours...

kwa
March 19th, 2009, 01:07
HI again! The question was: can I really parse a file with awk "as I want"? I mean, instead of making awk see a file's line as it "should naturally be", can awk see a file with lines delimitated by the field separator I choose (i.e., RS="BLANKLINE")? I am afraid that's not possible..

Results I got from using the separators you suggested: I have the at home :), so I'll show them to you as soon as I arrive, but more or less all attempts, when printing the first field ($1), displayed:

xxxx

if the file's structure was like this:

xxx edrwelrkwej
erèwrmgerltrmñ
srtñerlterte

retertkerltñkerñt
ert`perte`tke
erwerwerwtert3434534'0
928¨ç´ç´{ç

See you soon...

fonz
March 19th, 2009, 03:30
HI again! The question was: can I really parse a file with awk "as I want"?

I started wondering about that and the answer may be in the first paragraph of the awk manual: apparently, awk is line-based no matter how you look at it, so it's probably not the right tool for doing what you want.

However, perhaps the following small script will be of help?
#!/bin/sh

foo=""
while read line
do
if [ -z "$line" ]; then
echo -e "***BEGIN***\n$foo\n***END***" # replace this to do something useful with $foo
foo=""
else
if [ -z "$foo" ]; then
foo=$line
else
foo=`echo -e "$foo\n$line"`
fi
fi
done
if [ -n "$foo" ]; then
echo -e "***BEGIN***\n$foo\n***END***" # and replace this too
fi

Now, assuming you called the script, say, blanklineseperator.sh, taking the same file foo.txt
line 1
line 2

line 3

line 4
line 5
doing cat foo.txt|./blanklineseperator.sh results in
***BEGIN***
line 1
line 2
***END***
***BEGIN***
line 3
***END***
***BEGIN***
line 4
line 5
***END***
which if I'm not mistaken is what you're looking for. Just replace (both instances of!) echo -e "***BEGIN***\n$foo\n***END***" with whatever it is you want to do with the "blocks of lines" and you should be good to go.

It may not be the most elegant solution, but at least it works. Unless if you don't consider a line with only whitespace a blank line. If that's the case, the above still isn't (entirely) correct and you're probably better of with using C, Perl or some other programming language. Doing this in C is easy and for somebody better at Perl than me it should be easy in Perl too - and more elegant to boot. But I'll leave that as an exercise for the Perl experts.

Hope this helps,

Alphons

kwa
March 20th, 2009, 16:08
Hey! As you've probably realized, I could mange with what I was trying to do, that's why I vanished :)...tell you later how , it's a bit difficult to explain.

Any way thanks fonz, I thought the same as you did at first, but then I "found the light": setting RS ans FS to the same fieldseparator, awk recognizes many lines as one when "parsing", the thing is into that lines the newline character (and also the famous "blankline"..) are also included, but that was allright 'cause I redirected standard output...Well it worked, so thank you for helping!!

Anyway your script is of much help