I have a collection of HTML documents for which I need to parse the contents of the tags in the
If it suits your application you can use Tidy to convert HTML to valid XML, and then use as much XPath as you like!