Basic sed question

Hey guys I recently have gotten into shell scripting and have a question in regards to what I think would be a sed issue. I do a lot of html editing sometimes having to apply the same change to hundreds of files. I was wondering if it is possible to replace multiple lines of text using awk or sed?

Thank you so much!
 
dburkland said:
I do a lot of html editing sometimes having to apply the same change to hundreds of files. I was wondering if it is possible to replace multiple lines of text using awk or sed?

Code:
#!/bin/sh
ls *.html | while read file
  do 
     echo "processing ${file}"
     mv ${file} ${file}.bak
     cat ${file}.bak | sed "s/\(<[\/]*\)[Bb]>/\1strong>/g" > ${file}
  done

Change any <b>, <B>, </b>, </B> tags in every html file in the current directory into the corresponding <strong> and </strong> tags thus properly separating presentation and structure. Save the original files as backups in case your script runs amok which it will one day :)
 
[cmd=]sed -I .bak[/cmd] will save some lines there .., and using "|" as a sed separator instead of "/" will deobfuscate slightly ;)
 
DutchDaemon said:
[cmd=]sed -I .bak[/cmd] will save some lines there ..

Ah yes, but that can lead to unexpected behaviour. The -i switch would be preferred in that case.

However, I like to keep my scripts platform independent and the -i and -I sed command line switches are not available on every platform (eg SunOS 5.10 aka Solaris 10).
 
Thanks for the replies guys but I still dont understand how I can apply trev's example into what I want to do with sed. For example, say I have a file named test.hmtl and it contains the following code:
Code:
<li>
<a href="//testing.org/">Test Link</a>
</li>

My goal is to somewhat search for the entire chunk of code and replace all of it with the following code:
Code:
<img>
<src>
<a href="foo.bar">foo</a>
</src>

I have discovered that in order to output the code with newlines I need to follow each line with a '\' but the same doesn't apply to input.

Thanks again and sorry if I am confusing anyone.
 
I'm not sure if the entire command set of ed(1) is in sed, but ed can do that: 'find that expression, go back one line, set a marker, go forward two lines, set another marker, replace anything between markers with something else'. It's been too long for me, though ;)
 
dburkland said:
My goal is to somewhat search for the entire chunk of code and replace all of it with the following code:

Ah, so. No. sed is not cut out for matching a pattern which extends over more than 2 lines.

In that case, I'd usually write a simple regex(3) filter in C. You could, of course, use perl(1). Something like:

Code:
perl -pi.bak -e 'undef $/; s/<li>\nabc\n<\/li>/<strong>\nxyz\n<\/strong>/' *.html

should do the trick.
 
trev said:
Ah, so. No. sed is not cut out for matching a pattern which extends over more than 2 lines.

In that case, I'd usually write a simple regex(3) filter in C. You could, of course, use perl(1). Something like:

Code:
perl -pi.bak -e 'undef $/; s/<li>\nabc\n<\/li>/<strong>\nxyz\n<\/strong>/' *.html

should do the trick.
I created a test html file called index.html with the following
Code:
<li>
abc
</li>

and executed the command provided and the file was not changed in any way by it. I am very unfamiliar with Perl so I may just be missing a tweak in the command. Thanks again for your help!
 
With sed you cannot have newlines in the expression. You can only add them in the replacement string.

There is a workaround if you are certain of a certain character not being used:
Code:
#     Replace newlines with ~      Replace 3 lines         Replace ~ with newlines
... | rs -TeC~                   | sed 's/a~b~c/d~e~f/g' | rs -c~

Replace ~ with whatever characters suits you.
 
dburkland said:
I created a test html file called index.html with the following
Code:
<li>
abc
</li>

and executed the command provided and the file was not changed in any way by it. I am very unfamiliar with Perl so I may just be missing a tweak in the command. Thanks again for your help!

My script does not permit the first line of the file to be part of the target pattern. Best experiment on one of your real files, than a test file which simply contains the target pattern.
 
kamikaze said:
With sed you cannot have newlines in the expression. You can only add them in the replacement string.

Not true. I do it all the time. Consider:

Code:
sed '/-<LF>/ N; s/-<LF>\n\( SECT [0-9]*\)/-\1/'

The target pattern contains a newline :)
 
trev said:
Not true. I do it all the time. Consider:

Code:
sed '/-<LF>/ N; s/-<LF>\n\( SECT [0-9]*\)/-\1/'

The target pattern contains a newline :)

I have a few questions regarding this snippet of code. 1, what does the /-<LF>/ N; do? 2) Would it be possible to give an example of how I could replace the following code:
Code:
<LF>
abc
</LF>

I again thank you for your help, you guys are awesome!
 
dburkland said:
I have a few questions regarding this snippet of code. 1, what does the /-<LF>/ N; do?

/-<LF>/ move to the line address matching "-<LF>"
N; appends the next line to the contents of the pattern space

All of this is detailed in the man page for sed.

2) Would it be possible to give an example of how I could replace the following code:

No. As already stated, you cannot match a pattern which extends over more than two lines with sed.

You have been given two alternative solutions already. The newline replacement "work around" and the Perl code to achieve your stated aim. Pick one or explain why neither suits your requirements.
 
trev said:
No. As already stated, you cannot match a pattern which extends over more than two lines with sed.

My bad I apolgoize for my lack of understanding. The two snippets you guys provided will suffice. Thanks again for your help and patience!
 
I'd recommend using awk for this purpose. You can use it as a more powerful sed by issuing a command such as:

Code:
diff -u files ... | awk '{gsub("^--- ", "Index: Name of file\n=====\n--- "); print $0}'

In this case I am searching for "--- " line in diff output, then replacing it with "Index: Name of file", followed by a newline, followed by "=====", then a newline, then the original "--- ".

The command awk can do a lot more, but in this case, it works well as a more flexible sed, without using something as complex as perl. It is probably slower than sed, but it can do a lot more as well, such as the following:

Code:
diff -u files ... | awk '{split($0, arr, "^--- "); where = match($0, "^--- "); 
val = $0; gsub("^--- ", "Index: ", val); if (where) print (val "\n====="); print $0 }'

(Note that the above should be a one-liner, it is merely broken into multiple lines for readability)

You would have to try it on the output of diff to see exactly how it works, but as you can see, it is capable of some very sophisticated header formatting for patches, which I found very useful.

(Note that I am mentioning the use of awk here because I couldn't find many good tutorials on the program, and I thought that others might benefit from some examples when used with pipes. I assume that the OP solved their problem long ago...)
 
Dirty Sed

Okay, this appears to work, though it is an example of how ugly sed is in the wrong hands. I do not think it is particularly resilient. Especially, it will only work for documents the way you described, with no variation. That said, I used the following a test:

Code:
<html><head><title>ya</title></head>
<body>
<h1>Ignore this!</h1>
<li>
<a href="//testing.org/">Test Link</a>
</li>
<li>
<a href="different">Test Other</a>
</li>
<p>
What is happening?
</p>
<li>
<a href="//testing.org/">Test Link</a>
</li>
</body>
</html>

Then, the following is a sed script:

Code:
#n
/<li>/b append
p
b
:append
N
/<\/li>/!b append
/<li>\n<a href="\/\/testing.org\/">Test Link<\/a>\n<\/li>/c\
<img>\
<src>\
<a href="foo.bar">foo<\/a>\
<\/src>
p

I am sure there is probably a better way to do it, but once I start thinking of hold space my head hurts. This kind of does do what you want, though. I think it should be apparent why it is not very resilient.
 
Back
Top