HTML Agility Pack - Select node after particular paragraph

别说谁变了你拦得住时间么 提交于 2019-12-11 05:29:33

问题


I have this kind of situation : various files with the following HTML. I need to retreive only the list after "targetWord" paragraph (of course it changes position in the pages I need to parse). How can I do with HTML Agility Pack?

<p>Word1</p>
<ul>
<li>listobject1</li>
<li>listobject2</li>
<li>listobject3</li>
</ul>

<p>targetWord</p>
<ul>
<li>listobject4</li>
<li>listobject5</li>
<li>listobject6</li>
</ul>

<p>Word2</p>
<ul>
<li>listobject7</li>
<li>listobject8</li>
<li>listobject9</li>
</ul>

I need to obtain with my code only the list nodes after targetWord:

foreach (var node in retreivedNodes)
{
    s[i] = node.InnerText;
    i++;
    console.writeline (s[i]);
}

OUTPUT:

   listobject4
   listobject5
   listobject6

回答1:


You need to craft an xpath expression to match your requirement

Assuming that I have loaded a HAP.HtmlDocument with your snippet as var htmlSnippet then

htmlSnippet.DocumentNode.SelectNodes('//p[text()="targetWord"]/following-sibling::ul[1]//li')

will return the nodeset of li children of the first ul node following your target word p tag.



来源:https://stackoverflow.com/questions/56182368/html-agility-pack-select-node-after-particular-paragraph

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!