checker for broken links on html pages

Can anyone recommend an open source checker to find
stale, broken links on html pages?

I've tried:

linkcheck-1.4: Checks a web site for bad links
linkchecker-5.0.1: Check HTML documents for broken links
linklint-2.3.6.d: Perl script that checks links on web sites


and none of them work.
 
Depends on how intelligent it should be. You can find absolute links very easily, and check them with curl (print all failed links):

Code:
grep -Eo -e 'https?://[^"[:space:]]*' input.html | sort -u |\
while read u; do curl -sfI "$u" > /dev/null || echo "$u"; done

Or with csh:

Code:
foreach u (`grep -Eo -e 'https?://[^"[:space:]]*' input.html | sort -u`)
curl -sfI "$u" > /dev/null || echo "$u"
end

If you want to check relative links as well, you'll need a more sophisticated tool that you can point to a pages URL, so that it can resolve the relative links like a browser.
 
yes well.
I downloaded a web site using wget so the links are
rewritten as relative.
but, strangely, wget missed some links, dunno why it looks simple enough.
So I do actually need more of a relative link checker.

I'm writing one myself now.
 
Back
Top