Is it possible and what tools could be used to parse an html document as a string or from a file and then to construct a DOM tree so that a developer can walk the tree throu
You can take a look at NekoHTML, a Java library that performs a best effort cleaning and tag balancing in your document. It is an easy way to parse a malformed HTML (or a non-valid XML) file.