Basic sed question

dburkland · May 18, 2009

Hey guys I recently have gotten into shell scripting and have a question in regards to what I think would be a sed issue. I do a lot of html editing sometimes having to apply the same change to hundreds of files. I was wondering if it is possible to replace multiple lines of text using awk or sed?

Thank you so much!

danger@ · May 18, 2009

pamdirac · May 18, 2009

echo ciao | sed 's/ciao/hello/g'

trev · May 18, 2009

dburkland said:
I do a lot of html editing sometimes having to apply the same change to hundreds of files. I was wondering if it is possible to replace multiple lines of text using awk or sed?

Code:

#!/bin/sh
ls *.html | while read file
  do 
     echo "processing ${file}"
     mv ${file} ${file}.bak
     cat ${file}.bak | sed "s/\(<[\/]*\)[Bb]>/\1strong>/g" > ${file}
  done

Change any , , , tags in every html file in the current directory into the corresponding and tags thus properly separating presentation and structure. Save the original files as backups in case your script runs amok which it will one day

DutchDaemon · May 18, 2009

[cmd=]sed -I .bak[/cmd] will save some lines there .., and using "|" as a sed separator instead of "/" will deobfuscate slightly

trev · May 18, 2009

DutchDaemon said:
[cmd=]sed -I .bak[/cmd] will save some lines there ..

Ah yes, but that can lead to unexpected behaviour. The -i switch would be preferred in that case.

However, I like to keep my scripts platform independent and the -i and -I sed command line switches are not available on every platform (eg SunOS 5.10 aka Solaris 10).

dburkland · May 18, 2009

Thanks for the replies guys but I still dont understand how I can apply trev's example into what I want to do with sed. For example, say I have a file named test.hmtl and it contains the following code:

Code:

<li>
<a href="//testing.org/">Test Link</a>
</li>

My goal is to somewhat search for the entire chunk of code and replace all of it with the following code:

Code:

<img>
<src>
<a href="foo.bar">foo</a>
</src>

I have discovered that in order to output the code with newlines I need to follow each line with a '\' but the same doesn't apply to input.

Thanks again and sorry if I am confusing anyone.

DutchDaemon · May 18, 2009

I'm not sure if the entire command set of ed(1) is in sed, but ed can do that: 'find that expression, go back one line, set a marker, go forward two lines, set another marker, replace anything between markers with something else'. It's been too long for me, though

trev · May 19, 2009

dburkland said:
My goal is to somewhat search for the entire chunk of code and replace all of it with the following code:

Ah, so. No. sed is not cut out for matching a pattern which extends over more than 2 lines.

In that case, I'd usually write a simple regex(3) filter in C. You could, of course, use perl(1). Something like:

Code:

perl -pi.bak -e 'undef $/; s/<li>\nabc\n<\/li>/<strong>\nxyz\n<\/strong>/' *.html

should do the trick.

dburkland · May 19, 2009

trev said:
Ah, so. No. sed is not cut out for matching a pattern which extends over more than 2 lines.

In that case, I'd usually write a simple regex(3) filter in C. You could, of course, use perl(1). Something like:

Code:

perl -pi.bak -e 'undef $/; s/<li>\nabc\n<\/li>/\nxyz\n<\/strong>/' *.html

should do the trick.

I created a test html file called index.html with the following

Code:

<li>
abc
</li>

and executed the command provided and the file was not changed in any way by it. I am very unfamiliar with Perl so I may just be missing a tweak in the command. Thanks again for your help!

kamikaze · May 19, 2009

With sed you cannot have newlines in the expression. You can only add them in the replacement string.

There is a workaround if you are certain of a certain character not being used:

Code:

#     Replace newlines with ~      Replace 3 lines         Replace ~ with newlines
... | rs -TeC~                   | sed 's/a~b~c/d~e~f/g' | rs -c~

Replace ~ with whatever characters suits you.

trev · May 20, 2009

dburkland said:
I created a test html file called index.html with the following

Code:

<li> abc </li>

and executed the command provided and the file was not changed in any way by it. I am very unfamiliar with Perl so I may just be missing a tweak in the command. Thanks again for your help!

My script does not permit the first line of the file to be part of the target pattern. Best experiment on one of your real files, than a test file which simply contains the target pattern.

trev · May 20, 2009

kamikaze said:
With sed you cannot have newlines in the expression. You can only add them in the replacement string.

Not true. I do it all the time. Consider:

Code:

sed '/-<LF>/ N; s/-<LF>\n\( SECT [0-9]*\)/-\1/'

The target pattern contains a newline

dburkland · May 20, 2009

trev said:
Not true. I do it all the time. Consider:

Code:

sed '/-<LF>/ N; s/-<LF>\n$ SECT [0-9]*$/-\1/'

The target pattern contains a newline

I have a few questions regarding this snippet of code. 1, what does the /-<LF>/ N; do? 2) Would it be possible to give an example of how I could replace the following code:

Code:

<LF>
abc
</LF>

I again thank you for your help, you guys are awesome!

trev · May 21, 2009

dburkland said:
I have a few questions regarding this snippet of code. 1, what does the /-<LF>/ N; do?

/-<LF>/ move to the line address matching "-<LF>"
N; appends the next line to the contents of the pattern space

All of this is detailed in the man page for sed.

2) Would it be possible to give an example of how I could replace the following code:

No. As already stated, you cannot match a pattern which extends over more than two lines with sed.

You have been given two alternative solutions already. The newline replacement "work around" and the Perl code to achieve your stated aim. Pick one or explain why neither suits your requirements.

dburkland · May 21, 2009

trev said:
No. As already stated, you cannot match a pattern which extends over more than two lines with sed.

My bad I apolgoize for my lack of understanding. The two snippets you guys provided will suffice. Thanks again for your help and patience!

beyert · Apr 14, 2012

I'd recommend using awk for this purpose. You can use it as a more powerful sed by issuing a command such as:

Code:

diff -u files ... | awk '{gsub("^--- ", "Index: Name of file\n=====\n--- "); print $0}'

In this case I am searching for "--- " line in diff output, then replacing it with "Index: Name of file", followed by a newline, followed by "=====", then a newline, then the original "--- ".

The command awk can do a lot more, but in this case, it works well as a more flexible sed, without using something as complex as perl. It is probably slower than sed, but it can do a lot more as well, such as the following:

Code:

diff -u files ... | awk '{split($0, arr, "^--- "); where = match($0, "^--- "); 
val = $0; gsub("^--- ", "Index: ", val); if (where) print (val "\n====="); print $0 }'

(Note that the above should be a one-liner, it is merely broken into multiple lines for readability)

You would have to try it on the output of diff to see exactly how it works, but as you can see, it is capable of some very sophisticated header formatting for patches, which I found very useful.

(Note that I am mentioning the use of awk here because I couldn't find many good tutorials on the program, and I thought that others might benefit from some examples when used with pipes. I assume that the OP solved their problem long ago...)

anon12b · Apr 17, 2012

Dirty Sed

Okay, this appears to work, though it is an example of how ugly sed is in the wrong hands. I do not think it is particularly resilient. Especially, it will only work for documents the way you described, with no variation. That said, I used the following a test:

Code:

<html><head><title>ya</title></head>
<body>
<h1>Ignore this!</h1>
<li>
<a href="//testing.org/">Test Link</a>
</li>
<li>
<a href="different">Test Other</a>
</li>
<p>
What is happening?
</p>
<li>
<a href="//testing.org/">Test Link</a>
</li>
</body>
</html>

Then, the following is a sed script:

Code:

#n
/<li>/b append
p
b
:append
N
/<\/li>/!b append
/<li>\n<a href="\/\/testing.org\/">Test Link<\/a>\n<\/li>/c\
<img>\
<src>\
<a href="foo.bar">foo<\/a>\
<\/src>
p

I am sure there is probably a better way to do it, but once I start thinking of hold space my head hurts. This kind of does do what you want, though. I think it should be apparent why it is not very resilient.

Basic sed question

dburkland

danger@

Administrator

pamdirac

trev

DutchDaemon

Administrator

trev

dburkland

DutchDaemon

Administrator

trev

dburkland

kamikaze

trev

trev

dburkland

trev

dburkland

beyert

anon12b