how to access child node from node in htmlagility pack

我们两清 提交于 2019-12-08 14:40:04

问题


<html>
    <body>
        <div class="main">
            <div class="submain"><h2></h2><p></p><ul></ul>
            </div>
            <div class="submain"><h2></h2><p></p><ul></ul>
            </div>
        </div>
    </body>
</html>

I loaded the html into an HtmlDocument. Then I selected the XPath as submain. Then I dont know how to access to each tags i.e h2, p separately.

HtmlAgilityPack.HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class=\"submain\"]");
foreach (HtmlAgilityPack.HtmlNode node in nodes) {}

If I Use node.InnerText I get all the texts and InnerHtml is also not useful. How to select separate tags?


回答1:


The following will help:

HtmlAgilityPack.HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class=\"submain\"]");
foreach (HtmlAgilityPack.HtmlNode node in nodes) {
    //Do you say you want to access to <h2>, <p> here?
    //You can do:
    HtmlNode h2Node = node.SelectSingleNode("./h2"); //That will get the first <h2> node
    HtmlNode allH2Nodes= node.SelectNodes(".//h2"); //That will search in depth too

    //And you can also take a look at the children, without using XPath (like in a tree):        
    HtmlNode h2Node = node.ChildNodes["h2"];
}



回答2:


You are looking for Descendants

var firstSubmainNodeName = doc
   .DocumentNode
   .Descendants()
   .Where(n => n.Attributes["class"].Value == "submain")
   .First()
   .InnerText;



回答3:


From memory, I believe that each Node has its own ChildNodes collection, so within your for…each block you should be able to inspect node.ChildNodes.



来源:https://stackoverflow.com/questions/6282536/how-to-access-child-node-from-node-in-htmlagility-pack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!