How much work is it reasonable for an object constructor to do? Should it simply initialize fields and not actually perform any operations on data, or is it okay to have it
A possible option is to move the parsing code to a seperate function, make the constructor private, and have a static function parse( html ) that constructs the object and immediately calls the parse function.
This way you avoid the problems with parsing in the constructur (inconsistent state, problems when calling overridden functions, ...). But the client code still gets all the advantages (one call to get the parsed html or an 'early' error).