问题
I'm trying to use HtmlAgilityPack for parsing a web page information. This is my code:
using System;
using HtmlAgilityPack;
namespace htmparsing
{
class MainClass
{
public static void Main (string[] args)
{
string url = "https://bugs.eclipse.org";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
foreach(HtmlNode node in doc){
//do something here with "node"
}
}
}
}
But when I tried to access to doc.DocumentElement.SelectNodes I can not see DocumentElement in the list. I added the HtmlAgilityPack.dll in the references, but I don't know what's the problem.
回答1:
I've an article that demonstrates scraping DOM elements with HAP (HTML Agility Pack) using ASP.NET. It simply lets you go through the whole process step by step. You can have a look and try it.
Scraping HTML DOM elements using HtmlAgilityPack (HAP) in ASP.NET
and about your process it's working fine for me. I've tried this way as you did with a single change.
string url = "https://www.google.com";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a"))
{
outputLabel.Text += node.InnerHtml;
}
Got the output as expected. The problem is you are asking for DocumentElement from HtmlDocument object which actually should be DocumentNode. Here's a response from a developer of HTMLAgilityPack about the problem you are facing.
HTMLDocument.DocumentElement not in object browser
回答2:
The behavior you are seeing is correct.
Look at what you're actually doing: http://htmlagilitypack.codeplex.com/SourceControl/latest#Release/1_4_0/HtmlAgilityPack/HtmlNode.cs .
You're asking the top element to select nodes matching some xpath. Unless your xpath expression starts with a //, you're asking it for relative nodes, which are descendant nodes. A document element is a not a descendant of itself, because no element is a descendant of itself.
来源:https://stackoverflow.com/questions/19870116/using-htmlagilitypack-for-parsing-a-web-page-information-in-c-sharp