PDA

View Full Version : Basic sed question


dburkland
May 18th, 2009, 05:28
hey guys I recently have gotten into shell scripting and have a question in regards to what I think would be a sed issue. I do a lot of html editing sometimes having to apply the same change to hundreds of files. I was wondering if it is possible to replace multiple lines of text using awk or sed?

Thank you so much!

danger@
May 18th, 2009, 07:42
sed

pamdirac
May 18th, 2009, 14:59
echo ciao | sed 's/ciao/hello/g'

trev
May 18th, 2009, 15:35
I do a lot of html editing sometimes having to apply the same change to hundreds of files. I was wondering if it is possible to replace multiple lines of text using awk or sed?



#!/bin/sh
ls *.html | while read file
do
echo "processing ${file}"
mv ${file} ${file}.bak
cat ${file}.bak | sed "s/\(<[\/]*\)[Bb]>/\1strong>/g" > ${file}
done


Change any <b>, <B>, </b>, </B> tags in every html file in the current directory into the corresponding <strong> and </strong> tags thus properly separating presentation and structure. Save the original files as backups in case your script runs amok which it will one day :)

DutchDaemon
May 18th, 2009, 15:42
sed -I .bak will save some lines there .., and using "|" as a sed separator instead of "/" will deobfuscate slightly ;)

trev
May 18th, 2009, 15:48
sed -I .bak will save some lines there ..

Ah yes, but that can lead to unexpected behaviour. The -i switch would be preferred in that case.

However, I like to keep my scripts platform independent and the -i and -I sed command line switches are not available on every platform (eg SunOS 5.10 aka Solaris 10).

dburkland
May 18th, 2009, 18:15
Thanks for the replies guys but I still dont understand how I can apply trev's example into what I want to do with sed. For example, say I have a file named test.hmtl and it contains the following code:

<li>
<a href="//testing.org/">Test Link</a>
</li>

My goal is to somewhat search for the entire chunk of code and replace all of it with the following code:

<img>
<src>
<a href="foo.bar">foo</a>
</src>

I have discovered that in order to output the code with newlines I need to follow each line with a '\' but the same doesn't apply to input.

Thanks again and sorry if I am confusing anyone.

DutchDaemon
May 18th, 2009, 18:38
I'm not sure if the entire command set of ed is in sed, but ed can do that: 'find that expression, go back one line, set a marker, go forward two lines, set another marker, replace anything between markers with something else'. It's been too long for me, though ;)

trev
May 19th, 2009, 05:39
My goal is to somewhat search for the entire chunk of code and replace all of it with the following code:


Ah, so. No. sed is not cut out for matching a pattern which extends over more than 2 lines.

In that case, I'd usually write a simple regex(3) filter in C. You could, of course, use perl(1). Something like:


perl -pi.bak -e 'undef $/; s/<li>\nabc\n<\/li>/<strong>\nxyz\n<\/strong>/' *.html


should do the trick.

dburkland
May 19th, 2009, 16:23
Ah, so. No. sed is not cut out for matching a pattern which extends over more than 2 lines.

In that case, I'd usually write a simple regex(3) filter in C. You could, of course, use perl(1). Something like:


perl -pi.bak -e 'undef $/; s/<li>\nabc\n<\/li>/<strong>\nxyz\n<\/strong>/' *.html


should do the trick.
I created a test html file called index.html with the following <li>
abc
</li>


and executed the command provided and the file was not changed in any way by it. I am very unfamiliar with Perl so I may just be missing a tweak in the command. Thanks again for your help!

kamikaze
May 19th, 2009, 21:54
With sed you cannot have newlines in the expression. You can only add them in the replacement string.

There is a workaround if you are certain of a certain character not being used:

# Replace newlines with ~ Replace 3 lines Replace ~ with newlines
... | rs -TeC~ | sed 's/a~b~c/d~e~f/g' | rs -c~


Replace ~ with whatever characters suits you.

trev
May 20th, 2009, 01:48
I created a test html file called index.html with the following <li>
abc
</li>


and executed the command provided and the file was not changed in any way by it. I am very unfamiliar with Perl so I may just be missing a tweak in the command. Thanks again for your help!

My script does not permit the first line of the file to be part of the target pattern. Best experiment on one of your real files, than a test file which simply contains the target pattern.

trev
May 20th, 2009, 01:53
With sed you cannot have newlines in the expression. You can only add them in the replacement string.

Not true. I do it all the time. Consider:


sed '/-<LF>/ N; s/-<LF>\n\( SECT [0-9]*\)/-\1/'


The target pattern contains a newline :)

dburkland
May 20th, 2009, 21:33
Not true. I do it all the time. Consider:


sed '/-<LF>/ N; s/-<LF>\n\( SECT [0-9]*\)/-\1/'


The target pattern contains a newline :)

I have a few questions regarding this snippet of code. 1, what does the /-<LF>/ N; do? 2) Would it be possible to give an example of how I could replace the following code:

<LF>
abc
</LF>


I again thank you for your help, you guys are awesome!

trev
May 21st, 2009, 02:00
I have a few questions regarding this snippet of code. 1, what does the /-<LF>/ N; do?


/-<LF>/ move to the line address matching "-<LF>"
N; appends the next line to the contents of the pattern space

All of this is detailed in the man page for sed.


2) Would it be possible to give an example of how I could replace the following code:


No. As already stated, you cannot match a pattern which extends over more than two lines with sed.

You have been given two alternative solutions already. The newline replacement "work around" and the Perl code to achieve your stated aim. Pick one or explain why neither suits your requirements.

dburkland
May 21st, 2009, 19:52
No. As already stated, you cannot match a pattern which extends over more than two lines with sed.


My bad I apolgoize for my lack of understanding. The two snippets you guys provided will suffice. Thanks again for your help and patience!