Script to retrieve data from web page

balanga · Jan 12, 2020

Any suggestions as to how to go about retrieving data - specifically a single string - from a web page...

I'm thinking of using www/lynx to retrive a page and then try to parse the result, but am not sure if lynx is capable of being scripted.
Any advice welcome....

kpedersen · Jan 12, 2020

You could consider fetch(1).
It is in FreeBSD base and allows you to download a single web page to a file or stdout. Then you could grep/sed for the string?

drhowarddrfine · Jan 12, 2020

I see a lot of people using www/py-scrapy if you mean to scrape web sites.

msplsh · Jan 21, 2020

Use something with a libxml binding. I use PHP.

SirDice · Jan 21, 2020

Use Perl, Python, Ruby, LUA, whatever. Almost all scripting languages have something for this. My personal favorite is still www/p5-libwww (yes, I'm still a Perl monger).

trev · Jan 22, 2020

Lynx or Wget (especially useful for reloading needed cookies for some websites) in a Bourne shell script with grep and sed. (I've done it hundreds of times.)

msplsh · Jan 22, 2020

So...

1. Get the HTML using

lynx
wget
perl & p5-libwww
curl
python & scrapy
python & something way simpler like the requests library
php & libcurl
fetch

2. Then parse the HTML for the string using

php & libxml
python & lxml (via libxml output that scrapy vends)
python & beautifulsoup
REXX & whatever it uses
grep (DON'T do this)
sed (DON'T do this)

I wrote a python script for tweets that uses requests and BeautifulSoup. Just need two packages

pkg install py36-requests py36-beautifulsoup

Script to retrieve data from web page

balanga

kpedersen

drhowarddrfine

msplsh

SirDice

Administrator

trev

msplsh