C Good html2txt ?

Why do you need it for? Do you look for a way to extract the text, possibly including metadata and structure, or to format the page for text browsing (what troff(1) does for man pages)?
 
Why do you need it for? Do you look for a way to extract the text, possibly including metadata and structure, or to format the page for text browsing (what troff(1) does for man pages)?

I like simple C language applications, it is clean and it works on an Energy - respectful hardware to protect environment.

To run a browser, you need at least a super computer. C# is killing our planet (also Flash, Java,...).

Please visit:

http://www.greenpeace.org
or
http://www.childrenoftheearth.org/


Heavy software programming is destroying all energy and creating pollution. Same applies for mobile phones.

(even lighter would be better: link)
 
Is Perl acceptable? The www/p5-libwww has a couple of functions that will do what you need in just a few lines of Perl code.
 
You know, if you keep asking for suggestions and then posting links to your own utilities, people might actually notice a pattern there.

thank you. I just thought good to give a possible idea.
I deleted the link.

actually yeah w3m is rather good. links too.

However they cannot make a good conversion especially for table input textboxes... and so on. gumbo gives a good more accurate conversion.

thanks
 
To run a browser, you need at least a super computer. C# is killing our planet (also Flash, Java,...).
Nonsense. I've run a complete GUI (XWindows and the Chromium browser) on a Raspberry Pi, which consumes only a few watts (way less than the monitor it was attached to). To run a browser today requires a very small and low-powered device.

Heavy software programming is destroying all energy and creating pollution.
Nonsense. There are some companies that seriously care about the energy consumption of their servers. Can you imagine how much power companies like Google or Amazon or Microsoft use? Huge amounts. Recently I heard a statistic that google is the world's largest user of renewable power (more than aluminum smelters and steel mills). These big computer companies worry a lot about energy usage, because electricity is probably one of their largest costs (probably even more than payroll). Yet, they use heavy software programming, and languages such as C++, Java and Python, which you consider to be inefficient.

If you explain what you really need a text-based browser for, we can give you tips. Personally, I've used lynx before. The problem today is that a large fraction of all web pages are so complex (javascript and so on, fortunately the flash-based stuff is mostly gone) that lynx can no longer render it.

The other good option is to use a library which is capable of parsing HTML. For example, Python has a pretty good DOM parser for HTML; with a little bit of scripting skill, you can use this to extract the text parts from known friendly web pages (it won't work in general, in particular not for pages designed for mobile).
 
Nonsense. I've run a complete GUI (XWindows and the Chromium browser) on a Raspberry Pi, which consumes only a few watts (way less than the monitor it was attached to). To run a browser today requires a very small and low-powered device.


Nonsense. There are some companies that seriously care about the energy consumption of their servers. Can you imagine how much power companies like Google or Amazon or Microsoft use? Huge amounts. Recently I heard a statistic that google is the world's largest user of renewable power (more than aluminum smelters and steel mills). These big computer companies worry a lot about energy usage, because electricity is probably one of their largest costs (probably even more than payroll). Yet, they use heavy software programming, and languages such as C++, Java and Python, which you consider to be inefficient.

If you explain what you really need a text-based browser for, we can give you tips. Personally, I've used lynx before. The problem today is that a large fraction of all web pages are so complex (javascript and so on, fortunately the flash-based stuff is mostly gone) that lynx can no longer render it.

The other good option is to use a library which is capable of parsing HTML. For example, Python has a pretty good DOM parser for HTML; with a little bit of scripting skill, you can use this to extract the text parts from known friendly web pages (it won't work in general, in particular not for pages designed for mobile).

a web browser is a huge memory and resource waster. every one knows about that.

how much resource does windows need ...
 
Back
Top