I\'m working on an app which aggregates some feeds from the internet and reformats the content. So I\'m looking for a way to parse some HTML. Given XML and HTML are very sim
HTML is not necessarily well-formed XML, and that's the trouble when you parse it as XML.
Take the following example:
123
abc
789
If you view this chunk of html in a browser, it would show just as what you expected. But if you parse this as xml, there would be trouble, as those p tags are not closed.