Scraping a webpage with C# and HTMLAgility

蓝咒 提交于 2019-11-27 23:18:00

Check out this article on 4GuysFromRolla

http://www.4guysfromrolla.com/articles/011211-1.aspx

This is the article I used as my starting point with HTML Agility Pack and it's worked great. I'm confident that you'll get all the information you need from this article to perform the tasks you're trying to complete.

The beginning part is off:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("http://localhost");   

LoadHtml(html) loads an html string into the document, I think you want something like this instead:

HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument doc  = htmlWeb.Load("http://stackoverflow.com");

A working code, according to the HTML source you provided. It can be factorized, and I'm not checking for null values (in rows, cells, and each value inside the case). If you have the page in 127.0.0.1, that will work. Just paste it inside the Main method of a Console Application and try to understand it.

HtmlDocument doc = new HtmlWeb().Load("http://127.0.0.1");    

var rows = doc.DocumentNode.SelectNodes("//table[@class='data']/tr");
foreach (var row in rows)
{
    var cells = row.SelectNodes("./td");
    string title = cells[0].InnerText;
    var valueRow = cells[2];
    switch (title)
    {
        case "Part-Num":
            string partNum = valueRow.SelectSingleNode("./img[@alt]").Attributes["alt"].Value;
            Console.WriteLine("Part-Num:\t" + partNum);
            break;
        case "Manu-Number":
            string manuNumber = valueRow.SelectSingleNode("./img[@alt]").Attributes["alt"].Value;
            Console.WriteLine("Manu-Num:\t" + manuNumber);
            break;
        case "Description":
            string description = valueRow.InnerText;
            Console.WriteLine("Description:\t" + description);
            break;
        case "Manu-Country":
            string manuCountry = valueRow.InnerText;
            Console.WriteLine("Manu-Country:\t" + manuCountry);
            break;
        case "Last Modified":
            string lastModified = valueRow.InnerText;
            Console.WriteLine("Last Modified:\t" + lastModified);
            break;
        case "Last Modified By":
            string lastModifiedBy = valueRow.InnerText;
            Console.WriteLine("Last Modified By:\t" + lastModifiedBy);
            break;
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!