问题
For the moment the best way that I have found to be able to manipulate DOM from a string that contain HTML is:
WebBrowser webControl = new WebBrowser();
webControl.DocumentText = html;
HtmlDocument doc = webControl.Document;
There are two problems:
- Requires the
WebBrowser
object! - This can't be used with multiple threads; I need something that would work on different thread (other than the main thread).
Any ideas?
回答1:
I did a search to GooglePlex for HTML and I found Html Agility Pack I do not know if it's for that or not, I am downloading it right now to give a try.
回答2:
Depending on what you are trying to do (maybe you can give us more details?) and depending on whether or not the HTML is well-formed, you could convert this to an XmlDocument
:
System.Xml.XmlDocument x = new System.Xml.XmlDocument();
x.LoadXml(html); // as long as html is well-formed, i.e. XHTML
Then you could manipulate it easily, without the WebBrowser
instance. As for threads, I don't know enough about the implementation of XmlDocument
to know the answer to that part.
If the document isn't in proper form, you could use NTidy (.NET wrapper for HTML Tidy) to get it in shape first; I had to do this very thing for a project once and it really wasn't too bad.
回答3:
JasonBunting already posted this, but it really works to use a .net wrapper around HTML tidy and load it up in an XmlDocument.
I have used this .net wrapper before :
http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx
And implemented it somewhat like this:
string input = "<p>crappy html<br <img src=foo></div>";
HtmlTidy tidy = new HtmlTidy()
string output = tidy.CleanHtml(input, HtmlTidyOptions.ConvertToXhtml);
XmlDocument doc = new XmlDocument();
doc.LoadXml(output);
Sorry if considered a repost :)
回答4:
This is an old question. Now there are:
- The HTML Agility Pack (You have already found this)
- CsQuery, a .Net jQuery port, which will be great for jQuery developers
来源:https://stackoverflow.com/questions/232004/how-can-i-manipulate-the-dom-from-a-string-of-html-in-c