Parse malformed XML

拟墨画扇 提交于 2019-11-30 20:23:02

The HTML Agility Pack will parse html, rather than xhtml, and is quite forgiving. The object model will be familiar if you've used XmlDocument.

annakata

You might want to check out the answer to this question.

Basically somewhere between a .NET port of beautifulsoup and the HTML agility pack there is a way.

LBushkin

It's unlikely that you will be able to build an XmlDocument that has this level of malformed structure. XmlDocument (to my knowledge) requires that xml content adhere to proper nesting and closure syntax.

However, you suspect that you could parse this with an XmlReader instead. It may still throw exceptions if certain egregious errors are encountered, but according to the MSDN docs, it can at least disclose the location of the errors.

If you're just dealing with HTML, there is the HTML Agility Pack, which may serve your purposes.

Depending ont he specific needs, you might be able to use HTML Tidy to cleanup the document, then import it using the XMLDocument object.

What you are trying to do is very difficult. HTML cannot be parsed using an XML parser since XML is strict and HTML is not. If that HTML were compliant XHTML (HTML as XML), then an XML parser would parse the HTML without issue.

You might want to see if there are any HTML to XHTML converters out there, if you really want to use an XML parser for HTML.

In other words, I have yet to meet an XML parser that handles malformed XML... they are not designed to accept loose markup like HTML (for good reason, too :) )

You can't load malformed XML into a XmlDocument.

Check out the Html Agility Pack on CodePlex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!