问题
I'm actually working on a personal project in C# using WPF and WPF WebBrowser. I really need to explore html DOM Elements as we used to do in javascript or php..etc
In my MainWindow I have this variable :
private mshtml.HTMLDocument mainDocument = new mshtml.HTMLDocument();
In my webBrowser LoadComplete callback I have this :
mainDocument = (mshtml.HTMLDocument) mainBrowser.Document;
Ok, so this is nice, it's working.
Now if I do this :
mshtml.IHTMLElement elem = mainDocument.getElementById("MY_ID");
it's also very nice, can do elem.innerHTML or somes stuff like that.
BUT my problem is only HTMLDocument have methodes to find elements by ID, by tagnames..etc
I don't know how to find elements in IHTMLElement. I tried some stuff like casting IHTMLElement to IHTMLElement2..etc but nothing have worked.
Please if you have any ideas. A lot of people talks about hosting winforms webbrowser but I think it must have a way to do that only with mshtml.
Thanks a lot, If you need more information, please feel free to ask me
ps : I'm french so I'm sorry about my Engish skills
回答1:
If you want to parse HTML document in Winforms or wpf, you can use an excellent parser htmlagility pack. Refer to below link http://html-agility-pack.net
var url = "http://html-agility-pack.net/";
var web = new HtmlWeb();
var doc = web.Load(url);
After loading it in doc, you can get any attribute, tag, etc.
var value = doc.DocumentNode
.SelectNodes("//td/input")
.First()
.Attributes["value"].Value;
It's super easy, just explore the doc a bit and you can make full use of it.
You can load html agility pack even from webbrowser, like below
HtmlAgilityPack.HtmlDocument doc = new
HtmlAgilityPack.HtmlDocument();
doc.Load(webBrowser1.DocumentStream);
Or you can do like this
HtmlAgilityPack.HtmlDocument doc = new
HtmlAgilityPack.HtmlDocument();
doc.Load(webBrowser1.Document);
Thanks
回答2:
Thanks a lot @Sujit for your help. I've not enouth reputation to mark your answer as helpful but I hope others will do.
To get it work with wpf webbrowser I've done :
mainHTMLDoc.LoadHtml((mainBrowser.Document as mshtml.HTMLDocument).documentElement.innerHTML);
To manipulate everything in should use this :
using System.Linq;
After that you can do stuffs like that :
var table = mainHTMLDoc.GetElementbyId("MyID");
var rows = table.Element("tbody").Elements("tr");
for(int i=0; i< rows.Count();i++) {
var datacol1 = rows.ElementAt(i).Elements("td").ElementAt(0).Descendants("a").ElementAt(0).InnerHtml;
var datacol2 = rows.ElementAt(i).Elements("td").ElementAt(1).InnerText
}
Whitout using Linq you cannot use Elements function which are very very usefull ! Thanks again Sujit :)
来源:https://stackoverflow.com/questions/44918745/c-sharp-wpf-webbrowser-mshtml-explore-dom-find-elements