htmlagilitypack parsing links and inner text

一曲冷凌霜 提交于 2019-12-11 13:38:12

问题


I am new to the htmlagilitypack, I am try figure out a way which I will be able to get the links from a HTML set up like this

<div class="std"><div style="border-right: 1px solid #CCCCCC; float: left; height: 590px; width: 190px;"><div style="background-color: #eae3db; padding: 8px 0 8px  20px; font-weight: bold; font-size: 13px;">test</div>
    <div>
    <div style="font-weight: bold; margin: 5px 0 -6px;">FEATURED</div>
    <span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat1</span></a></span>
     <span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat2</span></a></span>
</div></div>

I have not wrote any code yet in c# but I was wondering whether anyone could advise what tags should point at to get the links and inner text when there are no HTML ID'. Thanks


回答1:


If you are familiar with XPATH you will be able to navigate through the elements and attributes of the html to get whatever you want. To get each href in the above you could write code as follows:

 const string xpath = "/div//span/a";

 //WebPage below is a string that contains the text of your example
 HtmlNode html = HtmlNode.CreateNode(WebPage);
 //The following gives you a node collection of your two <a> elements
 HtmlNodeCollection items = html.SelectNodes(xpath);
 foreach (HtmlNode a in items)
 {    
      if (a.Attributes.Contains("href"))
      //Get your value here
      {
           yourValue = a.Attributes["href"].Value
      }
 }

Note: I have not run or tested this code



来源:https://stackoverflow.com/questions/15479468/htmlagilitypack-parsing-links-and-inner-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!