How can I manipulate the DOM from a string of HTML in C#? [closed]

本秂侑毒 提交于 2019-12-20 10:25:58

问题


For the moment the best way that I have found to be able to manipulate DOM from a string that contain HTML is:

WebBrowser webControl = new WebBrowser();
webControl.DocumentText = html;
HtmlDocument doc = webControl.Document;

There are two problems:

  1. Requires the WebBrowser object!
  2. This can't be used with multiple threads; I need something that would work on different thread (other than the main thread).

Any ideas?


回答1:


I did a search to GooglePlex for HTML and I found Html Agility Pack I do not know if it's for that or not, I am downloading it right now to give a try.




回答2:


Depending on what you are trying to do (maybe you can give us more details?) and depending on whether or not the HTML is well-formed, you could convert this to an XmlDocument:

System.Xml.XmlDocument x = new System.Xml.XmlDocument();
x.LoadXml(html); // as long as html is well-formed, i.e. XHTML

Then you could manipulate it easily, without the WebBrowser instance. As for threads, I don't know enough about the implementation of XmlDocument to know the answer to that part.


If the document isn't in proper form, you could use NTidy (.NET wrapper for HTML Tidy) to get it in shape first; I had to do this very thing for a project once and it really wasn't too bad.




回答3:


JasonBunting already posted this, but it really works to use a .net wrapper around HTML tidy and load it up in an XmlDocument.

I have used this .net wrapper before :

http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx

And implemented it somewhat like this:

string input = "<p>crappy html<br <img src=foo></div>";
HtmlTidy tidy = new HtmlTidy()
string output = tidy.CleanHtml(input, HtmlTidyOptions.ConvertToXhtml);
XmlDocument doc = new XmlDocument();
doc.LoadXml(output);

Sorry if considered a repost :)




回答4:


This is an old question. Now there are:

  • The HTML Agility Pack (You have already found this)
  • CsQuery, a .Net jQuery port, which will be great for jQuery developers


来源:https://stackoverflow.com/questions/232004/how-can-i-manipulate-the-dom-from-a-string-of-html-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!