问题
This query works perfect for some countries like Germany
"//h2[span/@id='Cities' or span/@id='Other_destinations']" + "/following-sibling::ul[1]" + "/li";
Where the HTML is formatted as:
<h2>
<span id='Other_destination'></span>
</h2>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
<li>...</li>
</ul>
However, in a country like Afghanistan the Div is formatted as such:
<h2>
<span id='Other_destination'></span>
</h2>
<ul
<li>...</li>
</ul>
<ul>
<li>...</li>
</ul>
So the question becomes, how do I handle the event of a country like Afghanistan where "/following-sibling::ul[1]" + :/li"
only gets the first ul in Div='Other_destinations'? I hope that getting a handle on this will help with the other exceptions and formatting issues that I will come across on my other countries. Thank you.
回答1:
I hope this code solve your problem :
var xpath = "//ul[preceding-sibling::h2[span/@id='Cities' or span/@id='Other_destinations'] and following-sibling::h2[span/@id='Get_in']]" + "/li";
var doc = new HtmlDocument
{
OptionDefaultStreamEncoding = Encoding.UTF8
};
// You need to call a WebClient here and set to the html variable.
var html = String.Empty;
doc.LoadHtml(html);
using (var write = new StreamWriter("testText.txt"))
{
foreach (var node in doc.DocumentNode.SelectNodes(xpath))
{
var all = node.InnerText;
//Writes to text file
write.WriteLine(all);
}
}
The above XPath can be translated to :
- Select all the
ul
tags has between by ah2[span/@id='Cities' or span/@id='Other_destinations']
and ah2[span/@id='Get_in']]
I see that in all the pages has a span
tag with id='Get_in'
in the final.
I hope it solve your problem.
来源:https://stackoverflow.com/questions/21190747/getting-li-values-from-multiple-uls-using-htmlagilitypack-c-sharp