echo $a | sed -E 's/[0-9]*/!/g'
, which prints "!a! !k!k!k!".$ man grep
-E, --extended-regexp
Interpret pattern as an extended regular expression (i.e. force
grep to behave as egrep).
$ grep --version
grep (BSD grep) 2.5.1-FreeBSD
$ egrep --version
egrep (BSD grep) 2.5.1-FreeBSD
$ echo $a | grep -E -o '[0-9]+'
123456
$ echo $a | egrep -o '[0-9]+'
123456
$ echo $SHELL
/usr/local/bin/mksh
$ echo $0
-mksh
ika256 said:So maybe the regexp implementation of FreeBSD has a bug?
[peter@caspar ~]$ uname -a
Linux caspar.xx.xx 2.6.18-274.17.1.el5 #1 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
[peter@caspar ~]$ a="a 123456kkk"
[peter@caspar ~]$ echo $a | grep -o "[0-9]*"
[peter@caspar ~]$
ShelLuser said:I'm wondering which Linux distribution you tested this on, and perhaps also which shell you used. Because I can't reproduce your results:
...
So this behaviour isn't limited to FreeBSD, some Linux distributions (CentOS in this case) act exactly the same.
$ a="a 123456kkk"
$ echo $a | grep -o "[0-9]*"
123456
In the event that an RE could match more than one substring of a given
string, the RE matches the one starting earliest in the string. If the
RE could match more than one substring starting at that point, it matches
the longest.
I think there's more to this than merely regular expressions, because if you apply a different pattern the same issue occurs:kpa said:So clearly the FreeBSD behaviour is correct in relation to the documentation and at least to me makes much more sense. I would really like to hear why the other behaviour would be correct?
smtp2:/home $ a="a 123456kkk"
smtp2:/home $ echo $a | grep -o "[0-9]+"
smtp2:/home $
smtp2:/home $ a="a 123456kkk"
smtp2:/home $ echo $a | grep -oE "[0-9]+"
123456
smtp2:/home $
grep understands two different versions of regular expression syntax:
"basic" and "extended." In GNU grep, there is no difference in avail-
able functionality using either syntax.
In basic regular expressions the metacharacters ?, +, {, |, (, and )
lose their special meaning; instead use the backslashed versions \?,
\+, \{, \|, \(, and \).
smtp2:/home $ echo $a | grep -o "[0-9]\+"
123456
I don't think it has to do with greediness. grep on FreeBSD apparently doesn't discard a trivial match ("" at the beginning, which can't be made longer) in hope of finding a "more expected" match ("123456"), whereas other versions of grep do. When not using -o, FreeBSD grep will output every line because every line matches, but what would you think if you got this as output with -o?kpa said:I'm pretty sure that non-greediness of the closure (the star) operator has always been the default regardless of the options like -o used. The reason is that the greedy behaviour is an extension to the basic regular expressions. I have little difficulty understanding why the greedy behaviour should suddenly be the default with the -o option. Non-greedy behaviour causes some surprises but you just have to know them.
[B]$[/B] a="a 123456kkk"
[B]$[/B] echo $a | grep -o "[0-9]*"
123456
[B]$[/B]
[B]$[/B] echo ' 123' | grep -o '[0-9]'
1
2
3
$ echo ' 123' | grep -o '[0-9]*'
grep -o
with a regular expression that can match an empty string is not a special case and should be always treated as valid result.-o, --only-matching
Prints only the matching [U]part[/U] of the lines.
Obviously that's not always true, sincekpa said:I read that exactly as: "Print the first matching part, not every part of the string that match".
echo 123 | grep -o '[0-9]'
prints three matches. I only used the word "trivial" out of habit from using it in mathematical contexts, although I do realize it isn't a part of the grep vernacular. Also, remember that a manpage isn't a standard; it's a description, and descriptions can be inaccurate. I provided my logic from a computational perspective based on the observed behavior of the program.ShelLuser said:I'm wondering which Linux distribution you tested this on, and perhaps also which shell you used. Because I can't reproduce your results:
Linux debian 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64 GNU/Linux
irakli@debian:~$ grep --version
grep (GNU grep) 2.12
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
irakli@debian:~$ echo $SHELL
/bin/bash
irakli@debian:~$ a="a 123456kkk"
irakli@debian:~$ echo $a | grep -o "[0-9]*"
123456
irakli@debian:~$
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# echo $SHELL
/bin/bash
[root@localhost ~]# grep --version
GNU grep 2.6.3
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
[root@localhost ~]#
[root@localhost ~]# a="a 123456kkk"
[root@localhost ~]# echo $a | grep -o "[0-9]*"
123456
[root@localhost ~]#