Open an HTML Document with xml.Load

倾然丶 夕夏残阳落幕 提交于 2021-02-18 12:25:46

问题


I'd like to open an HTML document (as a string retrieved from a StreamReader, from the web), by creating a XMLDocument this way:

XmlDocument doc = new XmlDocument

doc.Load(string containing the retrieved document).

But since the HTML doc contains this head:

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" > 

It tells me that the document is invalid... Any way to workaround this?


回答1:


Normal html, even if it's valid html, is not valid xml.

There is a library called HtmlAgilityPack which is a popular 3rd party open source library that you can use to solve this problem:

  • http://www.google.co.uk/search?q=htmlagilitypack
  • How to use HTML Agility pack



回答2:


If you're positive that the HTML is valid XML, I imagine you could simply replace the HTML head with an XML one.




回答3:


first you have to validate that the XHTML is a valid XHTML document (it means that is a valid XML document too).

paste your XHTML code here and review the output. http://validator.w3.org/#validate_by_input

good luck!.




回答4:


One can use HTML Tidy Tidy.NET for this.



来源:https://stackoverflow.com/questions/6540699/open-an-html-document-with-xml-load

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!