How to read xpath values from many HTML files in .Net?

血红的双手。 提交于 2020-01-04 13:25:28

问题


I have about 5000 html files in a folder. I need to loop through them, open, grab say 10 values using xpath, close, and store in (SQL Server) DB.

What is the easiest way to do read the xpath values using .Net?

The xpaths should be pretty stable.

Please provide example code to read one value, say /html/head/title/text()

Thanks


回答1:


I think you should look into the HTML Agility Pack. It is an HTML parser rather than an XML parser, and is better for this task. If there is anything that doesn't agree with the XML being parsed then the parser will throw and exception. Using an HTML parser gives you a bit more leeway with the input files.

Example showing how to do something with all HREF (link) attributes:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }

I'm not near a compiler but the example you want is something like:

string title = doc.DocumentNode.SelectSingleNode("//title").InnerText;


来源:https://stackoverflow.com/questions/3340047/how-to-read-xpath-values-from-many-html-files-in-net

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!