问题
How do I get all the divs ids on a html page using Html Agility Pack. I am trying to get all the ids and put them into a collection.
<p>
<div class='myclass1'>
<div id='f'>
</div>
<div id="myclass2">
<div id="my"><div id="h"></div><div id="b"></div></div>
</div>
</div>
</p>
Code:
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);
HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("div");
How do I get collection of all divs ids?
回答1:
If you just want the ID's, you can get a collection of those id attribute nodes instead of getting a collection of the div element nodes. For instance:
List<string> ids = new List<string>();
foreach(XmlNode node in doc.SelectNodes("//div/@id"))
{
ids.Add(node.InnerText);
}
This will skip the div elements that don't have an ID, such as the <div class='myclass1'> element in your example.
"//div/@id" is an XPath string. XPath is a technology which is vary handy to learn if you deal much with XML, or in this case, HTML via the agility pack library. XPath is an industry standard which allows you to select matching nodes in an XML document.
//means you want it to select the following node as a child of the current node, or in any of its descendants. Since the current node is the root node of the document, this will find matching nodes anywhere in the document.divis an element name we want to match. So, in this case, we are telling it to find alldivelements anywhere in the document./indicates that you want a child node. In this case theidattribute is a child of thedivelement, so first we say we want thedivelement, then we need the forward slash to say we want one of thedivelement's child nodes.@idmeans we want to find all theidattributes. The@symbol indicates that it is an attribute name instead of an element name.
回答2:
Yo can get the collection of div by passing xpath syntax
Like this
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);
foreach(HtmlNode div doc.DocumentElement.SelectNodes("//div"))
{
///.. code here
}
来源:https://stackoverflow.com/questions/11526554/get-all-the-divs-ids-on-a-html-page-using-html-agility-pack