Is it possible and what tools could be used to parse an html document as a string or from a file and then to construct a DOM tree so that a developer can walk the tree throu
JTidy should let you do what you want.
Usage is fairly straight forward, but parsing is configurable. e.g.:
InputStream in = ...;
Tidy tidy = new Tidy();
// configure Tidy instance as required
...
...
Document doc = tidy.parseDOM(in, null);
Element root = doc.getDocumentElement();
The JavaDoc is hosted here.