sed | regex with tabulator problem

Hello,

I try to manipulate a text file. In this case, I try to comment the third row


structure of my text file

Code:
# line like this
text       tab or blank      text again
# should looks so
word      word2 after tab or blank      #word3 again after tab or blanks

I tried several regex variations but it doesnt work. I used [:space:], [ \t] and other expressions

Code:
sed -e 's/^\([a-z]+\)[[:space:]]+\([a-z]+\)[ \t]+/\1\2#\3/g' sedfile

what am I doing wrong?
 
Please state, in words, what you are trying to do with the regex. Do you mean "put a comment mark at the start of any line that starts with a lower-case word, followed by whitespace, followed by another lowercase word"?
 
BeaSDBoy said:
what am I doing wrong?

We are in 21 century, use extended regex that is more common and more powerful.

Code:
echo 'word  word      word  after tab or blanks' | \
     sed -E -e 's/^([a-z]+[[:space:]]+[a-z]+[[:space:]]+)([a-z]+)/\1\#\2/g'
Will do what you want for the first string of your example but it won't catch a second one because word2 is not [a-z]+

Regex below should catch "word1 word2 word3"

Code:
echo 'word1  word2      word3  after4 tab5 or blanks' | \
    sed -E -e 's/^([[:alnum:]]+[[:space:]]+[[:alnum:]]+[[:space:]]+)([[:alnum:]]+)/\1\#\2/g'
 
Hi,

thanks for your answers.

@ wblock@
the line starts with a word in lowercase, followed by tab or blanks, followed by a lowercase word, blanks or tabs again, and then the last word.
The hash mark sould be placed before the last / third word.

@AlexJ
I will try this. I think -E could be the solution ;-), and then, my expression shoud work too.
 
BeaSDBoy said:
I think -E could be the solution ;-), and then, my expression shoud work too.

-E isn't POSIX if your looking for strict portability. You could probably do what you want simply with awk(1). Out of curiosity does your filter have to be a single program or one-liner for one reason or another?
 
UNIXgod said:
-E isn't POSIX if your looking for strict portability. You could probably do what you want simply with awk(1). Out of curiosity does your filter have to be a single program or one-liner for one reason or another?

Is -E a FreeBSD only thing? Would it be portable like this?

Code:
echo 'word1  word2      word3  after4 tab5 or blanks' | \
    sed -r -e 's/^([[:alnum:]]+[[:space:]]+[[:alnum:]]+[[:space:]]+)([[:alnum:]]+)/\1\#\2/g'
 
UNIXgod said:

That is the funny part is when I posted previously I had just reached the one third mark on my first sed tutorial and saw this thread so I decided to try out the command for some sort of practice in cygwin on Windows. Both -E and -r worked and I did not know which version of sed was used for cygwin. After I got home I checked the man page for FreeBSD and noticed that it said -r was the GNU equivalent to the BSD -E. The link you posted did not mention either -E or -r but maybe I just couldn't find it when I searched for it in the web browser and I could not find any note of -E vs -r in the tutorial I was reading.
 
jwele said:
That is the funny part is when I posted previously I had just reached the one third mark on my first sed tutorial and saw this thread so I decided to try out the command for some sort of practice in cygwin on Windows. Both -E and -r worked and I did not know which version of sed was used for cygwin. After I got home I checked the man page for FreeBSD and noticed that it said -r was the GNU equivalent to the BSD -E. The link you posted did not mention either -E or -r but maybe I just couldn't find it when I searched for it in the web browser and I could not find any note of -E vs -r in the tutorial I was reading.

Both -r and -E are extensions -- Look at re_format() bugs for further advice as well as standards on the FreeBSD man page for sed. In this case specifically gnu extensions which are considered a superset of posix which at this time FreeBSD sed() provides compatibility with. The old saying was to "filter early and filter often". I'd personally find another way of dealing with the filtering over expecting sed to be the only tool used. Sed is not awk. Even where awk is used it's preferred to be chained together via other utilities via pipe.
 
Back
Top