Are you 100% sure it has to be C++ ??? Because doing it in a modern language is likely to be easier. I have used an (X-)HTML parser in Python before, and it was super easy. That one is part of the basic python distribution. I happen to know that there is a package called goquery (in the Go language, which is somewhat similar to C and C++) that people like. Finding one that's in straight C++ is likely to be harder, since that's not commonly used for text handling and web-facing projects.
Sorry about the non-answer to your question, but if you don't find one, consider switching languages, at least for this part of the project.