need help with sed and regexps

edhunter · Oct 16, 2009

Hello guys
I need help with sed and regular expressions.
I have an input file containing text with html formatting.
I have to import this file into another program that respects only tag.
I need to clean all html tags except
and variations of it, before importing this file.
All kind of br-s have to become .

something like that:
1.
,
, ... => 
2. "<whatever tag withot >" => ""

How could i do it using sed?

dennylin93 · Oct 16, 2009

$ sed 's/

/<\/br>/g'

should turn
into
. All the other changes should work with similar variations.

Zare · Oct 16, 2009

Code:

sed 's@<\([^
][^<>]*\)>\([^<>]*\)</\1>@\2@g'

Pipe the line into this and it should strip off all HTML tags, the content between the tags will remain intact, and
tags will remain too.

P.S.
Up The Irons!

edhunter · Oct 19, 2009

10x \m/
but it didnt work
here is sample file:

Code:

line1<tag1>alabala
blabla</tag2>
line2<tag>blabla
<tag3>text<tag4>blabla


</br>
</br>
< br />

here is sed output:

Code:

sed 's@<\([^
][^<>]*\)>\([^<>]*\)</\1>@\2@g' test.txt
line1<tag1>alabala
blabla</tag2>
line2<tag>blabla
<tag3>text<tag4>blabla


</br>
</br>
< br />

I did what i want with 3 seds.

Code:

sed -e "s:<[^<>]*br[^<>]*>:uniqstring123:g" Export.TXT > out1.txt
sed -e "s:<[^<>]*>::g" out1.txt > out2.txt
sed -e "s:uniqstring123:</br>:g" out2.txt > FINAL.TXT

but my way seems very lame... thats why i need another solution

need help with sed and regexps

edhunter

dennylin93

Zare

edhunter