问题
I am trying to learn some basic scraping and thanks to this site I have been able to learn a lot of new things, but now I am stuck with this problem...This is the code I am using:
var web = new HtmlWeb();
var doc = web.Load("url");
var nodes = doc.DocumentNode.SelectNodes("//*[@id='hotellist_inner']/div");
StreamWriter output = new StreamWriter("out.txt");
if (nodes != null)
{
foreach (HtmlNode item in nodes)
{
if (item != null && item.Attributes["data-recommended"] != null)
{
string line = "";
var nome = item.SelectSingleNode(".//h3/a").InnerText;
var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
line = line + nome + "," + rating + "," + price + "," + discount;
Console.WriteLine(line);
output.WriteLine(line);
}
}
}
It all works fine for the first two items (name and rating), but when it comes to price and discount I get empty results. I have analized the page (here is the link) with chrome scraper and it gets the results easily with the xpath I have used. I don't understand what I am doing wrong. Any help would be appreciated! :D
回答1:
After a quick look at the web page you're trying to scrape, not all item
has price and discount information. You need to handle this case properly to avoid exception, for example by checking for null
before getting the InnerText
. Your code with this slight change was able to get price and discount information where available :
if (item != null && item.Attributes["data-recommended"] != null)
{
string line = "";
var nome = item.SelectSingleNode(".//h3/a").InnerText;
var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
//set priceString to empty string if price is null, else set it to price.InnerText
var priceString = price == null ? "" : price.InnerText;
//do similar step for discountString
var discountString = discount == null ? "" : discount.InnerText;
line = line + nome + "," + rating + "," + priceString + "," + discountString;
Console.WriteLine(line);
output.WriteLine(line);
}
来源:https://stackoverflow.com/questions/25633380/html-agility-pack-empty-values-out-of-tables