Parsing Financial information from HTML

问题

First attempt at learning to work with HTML in Visual Studio and C#. I am using html agility pack library. to do the parsing.

From this page I am attempting to pull out the numbers from the "Net Income" row for each quarter.

here is my current progress, (But I am uncertain of how to proceed further):

        String url = "http://www.google.com/finance?q=NASDAQ:TXN&fstype=ii"
        var webGet = new HtmlWeb();
        var document = webGet.Load(url);
        var body = document.DocumentNode.Descendants()
                            .Where(n => n.Name == "body")
                            .FirstOrDefault();

        if (body != null)
        {

        }

回答1:

Well, first of all there's no need to get the body first, you can directly query the document for what you want. As for finding the value you're looking for, this is how you could do it:

HtmlNode tdNode = document.DocumentNode.DescendantNodes()
  .FirstOrDefault(n => n.Name == "td"
    && n.InnerText.Trim() == "Net Income");
if (tdNode != null)
{
  HtmlNode trNode = tdNode.ParentNode;
  foreach (HtmlNode node in trNode.DescendantNodes().Where(n => n.NodeType == HtmlNodeType.Element))
  {
    Console.WriteLine(node.InnerText.Trim());
    //Output:
    //Net Income
    //265.00
    //298.00
    //601.00
    //672.00
    //666.00
  }
}

Also note the Trim calls because there are newlines in the innertext of some elements.

来源：https://stackoverflow.com/questions/10897162/parsing-financial-information-from-html

标签

visual-studio-2010

c#-4.0

html-agility-pack

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!