Script to retrieve data from web page

Any suggestions as to how to go about retrieving data - specifically a single string - from a web page...

I'm thinking of using www/lynx to retrive a page and then try to parse the result, but am not sure if lynx is capable of being scripted.
Any advice welcome....
 
Use Perl, Python, Ruby, LUA, whatever. Almost all scripting languages have something for this. My personal favorite is still www/p5-libwww (yes, I'm still a Perl monger).
 
Lynx or Wget (especially useful for reloading needed cookies for some websites) in a Bourne shell script with grep and sed. (I've done it hundreds of times.)
 
So...

1. Get the HTML using
  • lynx
  • wget
  • perl & p5-libwww
  • curl
  • python & scrapy
  • python & something way simpler like the requests library
  • php & libcurl
  • fetch
2. Then parse the HTML for the string using
  • php & libxml
  • python & lxml (via libxml output that scrapy vends)
  • python & beautifulsoup
  • REXX & whatever it uses
  • grep (DON'T do this)
  • sed (DON'T do this)
I wrote a python script for tweets that uses requests and BeautifulSoup. Just need two packages

pkg install py36-requests py36-beautifulsoup
 
Back
Top