If you're scraping a table has an explicit "thead" and "tbody" such as:
Total |
Finished |
Unfinished |
63 | 33 | 2 |
69 | 29 | 3 |
57 | 28 | 1 |
You can use the following:
headers = [header.text_content() for header in table.cssselect("thead tr th")]
results = [{headers[i]: cell.text_content() for i, cell in enumerate(row.cssselect("td"))} for row in table.cssselect("tbody tr")]
This will produce:
[
{"Total": "63", "Finished": "33", "Unfinished": "2"},
{"Total": "69", "Finished": "29", "Unfinished": "3"},
{"Total": "57", "Finished": "28", "Unfinished": "1"}
]
P.S. This is using lxml.html. If you are using BeautifulSoup replace ".text_content()" with ".string" and ".cssselect" with ".findAll".