Get specified section/table/content from file using sed/awk/perl, etc.

Each file has the same structure as follows:

HTML:
<html>
<head></head>
<body><p><table>My table here!</table></p>
</body>
</html>

I'm looking to dump only the section between <table> and </table> (including those tags).
I've spend a little while to find some solution, but still isn't clear for me, what's the easiest way to achieve that.


Tried following solutions:
http://austinmatzko.com/2008/04/26/sed-multi-line-search-and-replace/
http://www.unix.com/shell-programming-scripting/147347-how-get-one-particular-section-using-awk.html
http://www.unix.com/shell-programming-scripting/66251-remove-html-tags-bash.html
http://www.unix.com/shell-programming-scripting/58479-multiple-line-match-using-sed.html


A good start:
Code:
lynx --base --source http://ai-contest.com/rankings.php | less "+/table"
Code:
sed -n '1h;1!H;${;g;s/<h2.*/No title here/g;p;}' sample.php
Code:
perl -0777 -pe 's/\A[^\{]*\{//s; s/\}.*?\{/\n/sg; s/\}[^\}]*\Z//s'
http://www.grymoire.com/Unix/Sed.html#uh-47

:)
 
A perl solution might include /usr/ports/www/p5-HTML-TableExtract. Or search ports on "p5-HTML-Table" keyword..
 
wblock:
Thank you for the great example, It looks very simple, I like simple solutions, but even it's, something it's missing.
Tried this command, empty result.
Tried:
Code:
perl -0777 -ne 'print $1' *
Empty output.
Code:
> echo test | perl -0777 -ne 'print \$1'
SCALAR(0x80123fde0)>
What I'm missing?
 
kenorb said:
wblock:
Thank you for the great example, It looks very simple, I like simple solutions, but even it's, something it's missing.
Tried this command, empty result.
Tried:
Code:
perl -0777 -ne 'print $1' *
Empty output.
Code:
> echo test | perl -0777 -ne 'print \$1'
SCALAR(0x80123fde0)>
What I'm missing?

The entire regex, for a start. A regex match to fill in $1.
% man perlre | less +/Capture
The "if" is also important.
 
Back
Top