Html Agility Pack. Load and scrape webpage

旧城冷巷雨未停 提交于 2019-11-26 19:50:14

问题


Is this the best way to get a webpage when scraping?

HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse();

var doc = new HtmlAgilityPack.HtmlDocument();

doc.Load(resp.GetResponseStream());
var element = doc.GetElementbyId("//start-left");
var element2 = doc.DocumentNode.SelectSingleNode("//body");
string html = doc.DocumentNode.OuterHtml;

I've seen HtmlWeb().Load to get a webpage. Is that a better alternative to load and the scrape the webpage?


Ok i'll try that instead.

HtmlDocument doc = web.Load(url);

Now when i got my doc and didn't get so mutch properties. No one like SelectSingleNode. The only one I can use is GetElementById, and that works but I whant to get a class.

Do I need to do it like this?

var htmlBody = doc.DocumentNode.SelectSingleNode("//body");
htmlBody.SelectSingleNode("//paging");

回答1:


Much easier to use HtmlWeb.

string Url = "http://something";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);


来源:https://stackoverflow.com/questions/10558149/html-agility-pack-load-and-scrape-webpage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!