Html Agility Pack Empty Values out of Tables

末鹿安然 提交于 2019-12-12 00:23:14

问题


I am trying to learn some basic scraping and thanks to this site I have been able to learn a lot of new things, but now I am stuck with this problem...This is the code I am using:

var web = new HtmlWeb();
var doc = web.Load("url");
var nodes = doc.DocumentNode.SelectNodes("//*[@id='hotellist_inner']/div");
StreamWriter output = new StreamWriter("out.txt");

if (nodes != null)
{
    foreach (HtmlNode item in nodes)
    {
        if (item != null && item.Attributes["data-recommended"] != null)
        {
            string line = "";
            var nome = item.SelectSingleNode(".//h3/a").InnerText;
            var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
            var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
            var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
            line = line + nome + "," + rating + "," + price + "," + discount;
            Console.WriteLine(line);
            output.WriteLine(line);
        }
    }
}

It all works fine for the first two items (name and rating), but when it comes to price and discount I get empty results. I have analized the page (here is the link) with chrome scraper and it gets the results easily with the xpath I have used. I don't understand what I am doing wrong. Any help would be appreciated! :D


回答1:


After a quick look at the web page you're trying to scrape, not all item has price and discount information. You need to handle this case properly to avoid exception, for example by checking for null before getting the InnerText. Your code with this slight change was able to get price and discount information where available :

if (item != null && item.Attributes["data-recommended"] != null)
{
    string line = "";
    var nome = item.SelectSingleNode(".//h3/a").InnerText;
    var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
    var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
    var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
    //set priceString to empty string if price is null, else set it to price.InnerText
    var priceString = price == null ? "" : price.InnerText;
    //do similar step for discountString
    var discountString = discount == null ? "" : discount.InnerText;
    line = line + nome + "," + rating + "," + priceString + "," + discountString;
    Console.WriteLine(line);
    output.WriteLine(line);
}


来源:https://stackoverflow.com/questions/25633380/html-agility-pack-empty-values-out-of-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!