strange grep greedy behaviour

Hi All,

How would you explain this:

Code:
 # echo aabb1ccdg1hsfsdf | grep -o "^[^1]*1"
aabb1
ccdg1

Also:
Code:
 # echo aabb1ccdg1hsfsdf | grep --color "^[^1]*1"
[color="Red"]aabb1ccdg1[/color]hsfsdf

It seems wrong to me, regexp strikn selection should stop at first "1".

What do you think? Is it a bug?

--fedya
 
The * is always 'greedy', it will parse the string from the back to the front.
 
Look, awk gives us different result, which seems to me correct, splitting happens at first "1":

Code:
 # echo aabb1ccdg1hsfsdf | awk -F "^[^1]*1" '{print $2}'
ccdg1hsfsdf
 
Oh, doh... Hehe.. It's actually simpler. You are correct, it should match the first 1.

But.. Your example matches twice:
Code:
echo aabb1ccdg1hsfsdf | grep --color "^[^1]*1"
Matches both aabb1 and ccdg1 but since they're both on the same line it'll look like it matched aabb1ccdg1. So it's not 1 match but 2. As shown with the -o option.
 
Thanks, SirDice, you're absolutely right.

But then... FreeBSD's awk is wrong!

Actually I discovered this behaviour on centos gawk, when I used similar regex as a word separator, and then results seemed wrong to me, but the classic awk seemed right. Now it looks that the opposite is true:

FreeBSD awk:

Code:
echo aabb1ccdg1hsfsdf | awk -F "^[^1]*1" '{print $1 "==" $2 "==" $3}'
==ccdg1hsfsdf==

gawk (on CentOS):
Code:
 # echo aabb1ccdg1hsfsdf | awk -F "^[^1]*1" '{print $1 "==" $2 "==" $3}'
====hsfsdf

But this probably deserves a separate thread.

--fedya
 
After some consideration, now again I think, that GNU grep and gawk are wrong is this case, but classic awk is right. See:

SirDice said:
Your example matches twice:
Code:
echo aabb1ccdg1hsfsdf | grep --color "^[^1]*1"
Matches both aabb1 and ccdg1

Yes, the regexp ^[^1]*1 matches "aabb1", but no, it does not match "ccdb1", because there is a ^ beginning-of-the-line anchor. The anchor makes it impossible to make multiple matches inside the string, as I understand.

So my original question is still about GNU grep sanity is still valid.
 
SirDice said:
But.. Your example matches twice:
Code:
echo aabb1ccdg1hsfsdf | grep --color "^[^1]*1"
Matches both aabb1 and ccdg1 but since they're both on the same line it'll look like it matched aabb1ccdg1. So it's not 1 match but 2. As shown with the -o option.

It's not true, as we can check this with this:

Code:
> echo aabb1[color="Red"]2[/color]ccdg1hsfsdf | grep --color "^[^1[color="red"]2[/color]]*1"
[color="red"]aabb1[/color]2ccdg1hsfsdf

It does not includes ccdg1. It looks like a grep bug... or not?
 
Yep, you are both right, it should only match "aabb1".

This stuff is tricky and I still trip on it after dealing with them for years.
It's no wonder you can write entire books on the subject of regex :e
 
Back
Top