问题
I have this kind of situation : various files with the following HTML. I need to retreive only the list after "targetWord" paragraph (of course it changes position in the pages I need to parse). How can I do with HTML Agility Pack?
<p>Word1</p>
<ul>
<li>listobject1</li>
<li>listobject2</li>
<li>listobject3</li>
</ul>
<p>targetWord</p>
<ul>
<li>listobject4</li>
<li>listobject5</li>
<li>listobject6</li>
</ul>
<p>Word2</p>
<ul>
<li>listobject7</li>
<li>listobject8</li>
<li>listobject9</li>
</ul>
I need to obtain with my code only the list nodes after targetWord:
foreach (var node in retreivedNodes)
{
s[i] = node.InnerText;
i++;
console.writeline (s[i]);
}
OUTPUT:
listobject4
listobject5
listobject6
回答1:
You need to craft an xpath expression to match your requirement
Assuming that I have loaded a HAP.HtmlDocument with your snippet as var htmlSnippet then
htmlSnippet.DocumentNode.SelectNodes('//p[text()="targetWord"]/following-sibling::ul[1]//li')
will return the nodeset of li children of the first ul node following your target word p tag.
来源:https://stackoverflow.com/questions/56182368/html-agility-pack-select-node-after-particular-paragraph