BeautifulSoup, a dictionary from an HTML table

后端 未结 4 1679
既然无缘
既然无缘 2020-12-05 07:38

I am trying to scrape table data from a website.

Here is a simple example table:

t = \'\' +\\
    \'
4条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-05 08:30

    If you're scraping a table has an explicit "thead" and "tbody" such as:

    Total Finished Unfinished
    63 33 2
    69 29 3
    57 28 1

    You can use the following:

    headers = [header.text_content() for header in table.cssselect("thead tr th")]
    results = [{headers[i]: cell.text_content() for i, cell in enumerate(row.cssselect("td"))} for row in table.cssselect("tbody tr")]
    

    This will produce:

    [
      {"Total": "63", "Finished": "33", "Unfinished": "2"},
      {"Total": "69", "Finished": "29", "Unfinished": "3"},
      {"Total": "57", "Finished": "28", "Unfinished": "1"}
    ]
    

    P.S. This is using lxml.html. If you are using BeautifulSoup replace ".text_content()" with ".string" and ".cssselect" with ".findAll".

提交回复
热议问题