Scraping a webpage with C# and HTMLAgility

后端 未结 2 482
半阙折子戏
半阙折子戏 2020-12-06 07:56

I have read that HTMLAgility 1.4 is a great solution to scraping a webpage. Being a new programmer I am hoping I could get some input on this project. I am doing this as a c

相关标签:
2条回答
  • 2020-12-06 08:22

    The beginning part is off:

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml("http://localhost");   
    

    LoadHtml(html) loads an html string into the document, I think you want something like this instead:

    HtmlWeb htmlWeb = new HtmlWeb();
    HtmlDocument doc  = htmlWeb.Load("http://stackoverflow.com");
    
    0 讨论(0)
  • 2020-12-06 08:24

    A working code, according to the HTML source you provided. It can be factorized, and I'm not checking for null values (in rows, cells, and each value inside the case). If you have the page in 127.0.0.1, that will work. Just paste it inside the Main method of a Console Application and try to understand it.

    HtmlDocument doc = new HtmlWeb().Load("http://127.0.0.1");    
    
    var rows = doc.DocumentNode.SelectNodes("//table[@class='data']/tr");
    foreach (var row in rows)
    {
        var cells = row.SelectNodes("./td");
        string title = cells[0].InnerText;
        var valueRow = cells[2];
        switch (title)
        {
            case "Part-Num":
                string partNum = valueRow.SelectSingleNode("./img[@alt]").Attributes["alt"].Value;
                Console.WriteLine("Part-Num:\t" + partNum);
                break;
            case "Manu-Number":
                string manuNumber = valueRow.SelectSingleNode("./img[@alt]").Attributes["alt"].Value;
                Console.WriteLine("Manu-Num:\t" + manuNumber);
                break;
            case "Description":
                string description = valueRow.InnerText;
                Console.WriteLine("Description:\t" + description);
                break;
            case "Manu-Country":
                string manuCountry = valueRow.InnerText;
                Console.WriteLine("Manu-Country:\t" + manuCountry);
                break;
            case "Last Modified":
                string lastModified = valueRow.InnerText;
                Console.WriteLine("Last Modified:\t" + lastModified);
                break;
            case "Last Modified By":
                string lastModifiedBy = valueRow.InnerText;
                Console.WriteLine("Last Modified By:\t" + lastModifiedBy);
                break;
        }
    }
    
    0 讨论(0)
提交回复
热议问题