BSD sed command does an unexpected behavior

richmikan

New Member


Messages: 7

I found an unexpected behavior about the BSD sed command.

Try the following command on your FreeBSD host and you will probably see the following response.
Code:
$ seq 1 10 | awk '{print $1 ",A"}' | sed '3,4N; s/\n/-/g'"]
1,A
2,A
3,A-4,A
5,A-6,A
7,A-8,A
9,A
10,A
$
However the GNU sed command, which is available by textproc/gsed (ports) or
on a Linux host, returns a different response. That is as follows,
Code:
$ seq 1 10 | awk '{print $1 ",A"}' | gsed '3,4N; s/\n/-/g'"]
1,A
2,A
3,A-4,A
5,A
6,A
7,A
8,A
9,A
10,A
$
I don't know why the two versions of sed commands return different responses. I suspect that the GNU sed works correctly. Because the sed "3,4N" orders to concatenate with the next line, only from the line #3 to the line #4, but NOT TO line #5.

The BSD sed has something wrong? Or that is just my misunderstanding?
 

graudeejs

Son of Beastie

Reaction score: 692
Messages: 4,615

Just to note, you have extra 2 characters at the end of each command line (this is not the problem with sed, just a typo).
 
OP
OP
richmikan

richmikan

New Member


Messages: 7

Oh, sorry! Those("]) are typos.:r
Thank you for your indicating.
Your indication is certainly true.


But the unexpected behavior is not going to go away by erasing the typo characters.
Don't you know the reason?
 

graudeejs

Son of Beastie

Reaction score: 692
Messages: 4,615

Personally I don't know. But different implementations might have different behavior.
I think you should ask on @stable mailinglist.

Currently It looks like a bug, but I'm not that much into sed.
 

J65nko

Well-Known Member

Reaction score: 127
Messages: 453

On OpenBSD the output is as follows:
Code:
[cmd=$]jot 10 1 |  awk '{print $1 ",A"}' | sed '3,4N; s/\n/-/g'[/cmd]  
1,A
2,A
3,A-4,A
5,A-6,A
7,A-8,A
9,A-10,A
 
OP
OP
richmikan

richmikan

New Member


Messages: 7

Thank you for your advise.
You also think that looks like a bug, don't you?

I will ask on the mailinglist.
Thanks again.
 
OP
OP
richmikan

richmikan

New Member


Messages: 7

J65nko said:
On OpenBSD the output is as follows:
Code:
[cmd=$]jot 10 1 |  awk '{print $1 ",A"}' | sed '3,4N; s/\n/-/g'[/cmd]  
1,A
2,A
3,A-4,A
5,A-6,A
7,A-8,A
9,A-10,A
That confuses me furthermore!
Hmm, isn' it a complex problem?:(
 

_martin

Aspiring Daemon

Reaction score: 156
Messages: 766

Hm, can't comment on the sed part , but for the sake of comparison I'm attaching results from HPUX (all 11i versions - 11.11/11.23/11.31):

# printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n" | awk '{print $1 ",A"}' | sed '3,4N; s/\n/-/g'
Code:
1,A
2,A
3,A-4,A
5,A
6,A
7,A
8,A
9,A
10,A
There's no seq nor jot in hpux, so I had to improvise.
 

fonz

Son of Beastie

Reaction score: 369
Messages: 2,560

Just to pitch in: it could be a bug, or it could be a difference in semantics between GNU sed and BSD sed. Does the man page provide any hints as to what is expected BSD sed behaviour in this case?
 
OP
OP
richmikan

richmikan

New Member


Messages: 7

Thank you for reporting, everyone.

I also report the other implement of sed.
  1. sed on AIX 6.1.0.0
  2. sed on HP-UX B.11.23
  3. sed on SunOS 5.9(Solaris 9)
They all return the same responses as the GNU sed. Those implementations are probably different from GNU's.

> matoatlantis

How about the following command for the OSs with neither seq nor jot.
$ yes A | head -n 10 | awk '{print NR "," $1}' | sed '3,4N; s/\n/-/g'
 

throAU

Aspiring Daemon

Reaction score: 147
Messages: 910

Different from GNU is not a bug.

What does the FreeBSD manpage say the FreeBSD behaviour should be?
 

graudeejs

Son of Beastie

Reaction score: 692
Messages: 4,615

Isn't sed in Solaris 9 the same as GNU sed? I ask, because I know there is GNU stuff on Solaris (on newer versions). Don't know about other Unixes.
 

_martin

Aspiring Daemon

Reaction score: 156
Messages: 766

richmikan said:
How about the following command for the OSs with neither seq nor jot.
$ yes A | head -n 10 | awk '{print NR "," $1}' | sed '3,4N; s/\n/-/g'
# yes A | head -n 10 | awk '{print NR "," $1}' | sed '3,4N; s/\n/-/g'
Code:
1,A
2,A
3,A-4,A
5,A
6,A
7,A
8,A
9,A
10,A
Output is from 11.31 as other releases had the same output.
I highly doubt sed in HPUX is GNU sed. But according to docs it follows following standards:

Code:
 STANDARDS CONFORMANCE
      sed: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2
On solaris 10 you can choose different sed depending on standard:

# printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n" | awk '{print $1 ",A"}' | /usr/bin/sed '3,4N; s/\n/-/g'
Code:
1,A
2,A
3,A-4,A
5,A
6,A
7,A
8,A
9,A
10,A
Output is the same even with using /usr/xpg4/bin/sed instead.
 

J65nko

Well-Known Member

Reaction score: 127
Messages: 453

The gsed man page at http://www.freebsd.org/cgi/man.cgi?query=gsed&apropos=0&sektion=0&manpath=FreeBSD+9.0-RELEASE+and+Ports&arch=default&format=html describes the N command as follows:
Code:
n N    Read/append the next line of input into the pattern space.
The FreeBSD description from sed(1):
Code:
     [2addr]N
	     Append the next line of input to the pattern space, using an
	     embedded newline character to separate the appended material from
	     the original contents.  [color=blue]Note that the current line number
	     changes.[/color]
The OpenBSD man page also has this note.
 

fonz

Son of Beastie

Reaction score: 369
Messages: 2,560

So, apparently what we have here is a documented semantic difference between GNU sed and BSD sed.
 

mvatten

New Member

Reaction score: 5
Messages: 11

But the man page of plan9port sed states the same, while not giving the same result as FreeBSD sed:

Code:
`N    Append the next line of input to the pattern
      space with an embedded newline.  (The current
      line number changes.)'
Mark.
 
OP
OP
richmikan

richmikan

New Member


Messages: 7

Thans for everyone, again.

I suppose that...
Even if the behavior of the BSD sed is not a bug but a semantic,
I can't concretely understand and explain the reason of the behaior.

I don't know why
"3,4N" suggests "3,A-4,A" and "5,A-6,A", "7,A-8,A"
while "3,5N" suggests only "3,A-4,A",
on the FreeBSD sed.:(
 

J65nko

Well-Known Member

Reaction score: 127
Messages: 453

sed(1) works on lines. And a line is a sequence of non-linefeed characters followed by a linefeed character ("\n"). The example code that we have been looking at, "messes around" with that critical linefeed. We replace it with a "-":

Code:
3,4 {
N
s/\n/-/
}
So we change the marker that defines the chunk of data sed(1) is working with.

In the following attempts I use this text file:
Code:
[cmd=$] cat 1-10a.txt[/cmd]
1,A
2,A
3,A
4,A
5,A
6,A
7,A
8,A
9,A
10,A
The sed(1) command file:
Code:
[cmd=$] cat cmd4.sed[/cmd]
3,4 {
H
}

5 {
x
s/\n/-/g
}
Lines 3-4 are transferred from pattern space to Hold space. At line 5 we swap Hold space and pattern space, and substitute the newline with the hyphen.

Code:
[cmd=$]sed -f cmd4.sed 1-10a.txt[/cmd]
1,A
2,A
3,A
4,A
3,A-4,A
6,A
7,A
8,A
9,A
10,A
Now line 3-4 are still being displayed and line 5 is missing.
An ugly hack is instruct sed(1) not to echo the lines with the -n option. and explicitly to use p to print:

Code:
[cmd=#] cat cmd5.sed[/cmd]                            
1,2 {
p
}

3,4{
H
}

5 {
x
s/\n/-/g
p
x
p
}

6,10 {
p
}
Yes, it is ugly, but produces the wanted output:

Code:
[cmd=$]sed -nf cmd5.sed 1-10a.txt[/cmd] 
1,A
2,A
3,A-4,A
5,A
6,A
7,A
8,A
9,A
10,A
 

throAU

Aspiring Daemon

Reaction score: 147
Messages: 910

richmikan said:
Thans for everyone, again.

I suppose that...
Even if the behavior of the BSD sed is not a bug but a semantic,
I can't concretely understand and explain the reason of the behaior.
Because GNU wrote gsed afterwards, and changed the behaviour.
 
Top