Parsing HTML Table in C#

后端 未结 3 1723
深忆病人
深忆病人 2020-12-02 14:17

I have an html page which contains a table and i want to parse that table in C# windows form

http://www.mufap.com.pk/payout-report.php?tab=01

相关标签:
3条回答
  • 2020-12-02 14:19

    Late on this, but a way to do what you ask using plain vanilla C# code may be the following

    /// <summary>
    /// parses a table and returns a list containing all the data with columns separated by tabs
    /// e.g.: records = getTable(doc, 0);
    /// </summary>
    /// <param name="doc">HtmlDocument to work with</param>
    /// <param name="number">table index (base 0)</param>
    /// <returns>list containing the table data</returns>
    public List<string> getTableData(HtmlDocument doc, int number)
    {
      HtmlElementCollection tables = doc.GetElementsByTagName("table");
      int idx=0;
      List<string> data = new List<string>();
    
      foreach (HtmlElement tbl in tables)
      {
        if (idx++ == number)
        {
          data = getTableData(tbl);
          break;
        }
      }
      return data;
    }
    
    /// <summary>
    /// parses a table and returns a list containing all the data with columns separated by tabs
    /// e.g.: records = getTable(getElement(doc, "table", "id", "table1"));
    /// </summary>
    /// <param name="tbl">HtmlElement table to work with</param>
    /// <returns>list containing the table data</returns>
    public List<string> getTableData(HtmlElement tbl)
    {
      int nrec = 0;
      List<string> data = new List<string>();
      string rowBuff;
    
      HtmlElementCollection rows = tbl.GetElementsByTagName("tr");
      HtmlElementCollection cols;
      foreach (HtmlElement tr in rows)
      {
        cols = tr.GetElementsByTagName("td");
        nrec++;
        rowBuff = nrec.ToString();
        foreach (HtmlElement td in cols)
        {
          rowBuff += "\t" + WebUtility.HtmlDecode(td.InnerText);
        }
        data.Add(rowBuff);
      }
    
      return data;
    }
    

    the above will allow you to extract data from a table either by using the table "index" inside the page (useful for unnamed tables) or by passing the "table" HtmlElement to the function (faster but only useful for named tables); notice that I choose to return a "List" as the result and separating the various columns data using a tab character; you may easily change the code to return the data in whatever other format you prefer

    0 讨论(0)
  • 2020-12-02 14:26

    Using Html Agility Pack

    WebClient webClient = new WebClient();
    string page = webClient.DownloadString("http://www.mufap.com.pk/payout-report.php?tab=01");
    
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(page);
    
    List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='mydata']")
                .Descendants("tr")
                .Skip(1)
                .Where(tr=>tr.Elements("td").Count()>1)
                .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
                .ToList();
    
    0 讨论(0)
  • 2020-12-02 14:35

    Do you mean something like this ?

    foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table")) {
        ///This is the table.    
        foreach (HtmlNode row in table.SelectNodes("tr")) {
        ///This is the row.
            foreach (HtmlNode cell in row.SelectNodes("th|td")) {
                ///This the cell.
            }
        }
    }
    
    0 讨论(0)
提交回复
热议问题