I'll just add to @MJB answer after working with most of the HTML parsing libraries in Java, there is a huge pro/con that is omitted: parsers that preserve the formatting and incorrectness of the HTML on input and output.
That is most parsers when you change the document will blow away the whitespace, comments, and incorrectness of the DOM particularly if they are an XML like library.
Jericho is the only parser I know that allows you to manipulate nasty HTML while preserving whitespace formatting and the incorrectness of the HTML (if there is any).