Getting li values from multiple ul's using HtmlAgilityPack C#

别来无恙 提交于 2019-12-11 23:23:13

问题


This query works perfect for some countries like Germany

"//h2[span/@id='Cities' or span/@id='Other_destinations']" + "/following-sibling::ul[1]" + "/li";

Where the HTML is formatted as:

<h2>
<span id='Other_destination'></span>
</h2>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
<li>...</li>
</ul>

However, in a country like Afghanistan the Div is formatted as such:

<h2>
    <span id='Other_destination'></span>
    </h2>
<ul
<li>...</li>
</ul>
<ul>
<li>...</li>
</ul>

So the question becomes, how do I handle the event of a country like Afghanistan where "/following-sibling::ul[1]" + :/li" only gets the first ul in Div='Other_destinations'? I hope that getting a handle on this will help with the other exceptions and formatting issues that I will come across on my other countries. Thank you.


回答1:


I hope this code solve your problem :

var xpath = "//ul[preceding-sibling::h2[span/@id='Cities' or span/@id='Other_destinations'] and following-sibling::h2[span/@id='Get_in']]" + "/li";

var doc = new HtmlDocument
{
   OptionDefaultStreamEncoding = Encoding.UTF8
};

// You need to call a WebClient here and set to the html variable.    
var html = String.Empty;

doc.LoadHtml(html);

using (var write = new StreamWriter("testText.txt"))
{
    foreach (var node in doc.DocumentNode.SelectNodes(xpath))
    {
        var all = node.InnerText;

        //Writes to text file
        write.WriteLine(all);
     }
 }        

The above XPath can be translated to :

  • Select all the ul tags has between by a h2[span/@id='Cities' or span/@id='Other_destinations'] and a h2[span/@id='Get_in']]

I see that in all the pages has a span tag with id='Get_in' in the final.

I hope it solve your problem.



来源:https://stackoverflow.com/questions/21190747/getting-li-values-from-multiple-uls-using-htmlagilitypack-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!