HtmlAgilityPack and selecting Nodes and Subnodes

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-02 17:01:38

The following works for me. The important bit is just as BeniBela noted to add a dot in second call to 'SelectNodes'.

List<Record> lstRecords=new List<Record>();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
  Record record=new Record();
  foreach (HtmlNode node2 in node.SelectNodes(".//span[@prop]"))
  {
    string attributeValue = node2.GetAttributeValue("prop", "");
    if (attributeValue == "name")
    {
      record.Name = node2.InnerText;
    }
    else if (attributeValue == "company")
    {
      record.company = node2.InnerText;
    }
    else if (attributeValue == "street")
    {
      record.street = node2.InnerText;
    }
  }
  lstRecords.Add(record);
}

If you use //, it searches from the document begin.

Use .// to search all from the current node

 foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes(".//span[@prop]"))

Or drop the prefix entirely to search just for direct children:

 foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("span[@prop]"))
Oscar Mederos

First of all, take a look at this: Html Agility Pack - Problem selecting subnode

Here is a full working solution for your question:

IList<Record> results = new List<Record>();
foreach (var node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']")) {
    var record = new Record();
    record.Name = node.SelectSingleNode(".//span[@prop='name']").InnerText;
    record.company = node.SelectSingleNode(".//span[@prop='company']").InnerText;
    record.street = node.SelectSingleNode(".//span[@prop='street']").InnerText;
    results.Add(record);
}

If you read the question I pointed you to, you will see that doing ./span[@prop='name'] is exactly the same, since those span nodes are (direct) children of the div node.


If the span nodes do not have those prop attributes, and you want to assign them depending on the order they appear, you can do:

foreach (var node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']")) {
    var spanNodes = node.SelectNodes("./span");
    var record = new Record();
    record.Name = spanNodes[0].InnerText;
    record.company = spanNodes[1].InnerText;
    record.street = spanNodes[2].InnerText;
    results.Add(record);
}

Shame on me :)

All of you were right.

I found the problem. This NullReferenceException kept nagging me so I spent more time to look at it in detail. In between all those divs there was one div with the same "class='search-hit'" attribute but without the spans inside. Thats why it throughs an error at the second loop.

foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//span[@prop]/ancestor::div[@class='search_hit']"))
   {
        Record rec = new Record();
        foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes(".//span[@prop]"))
           {
           }
           rList.Results.Add(rec);
   }

The code above is working.

Thank you guys for your time and help!

I used that. class convert id

  HtmlNodeCollection nodes = dokuman.DocumentNode.SelectNodes("//div[@id='search_hit']//span[@prop]");


            for (int i = 0; i < nodes .Count; i++)
        {
            var record = new Record();


                record.Name = links[i].InnerText;   results.Add(record);  }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!