How to convert an HTML table to an array in python

前端未结

关注

 3  1865

故里飘歌 2020-12-24 03:23

I have an html document, and I want to pull the tables out of this document and return them as arrays. I\'m picturing 2 functions, one that finds all the html tables in a d

3条回答

粉色の甜心 (楼主)

2020-12-24 04:11
Pandas can extract all of the tables in your html to a list of dataframes right out of the box, saving you from having to parse the page yourself (reinventing the wheel). A DataFrame is a powerful type of 2-dimensional array.

I recommend continuing to work with the data via Pandas since it's a great tool, but you can also convert to other formats if you prefer (list, dictionary, csv file, etc.).

Example
```
"""Extract all tables from an html file, printing and saving each to csv file."""

import pandas as pd

df_list = pd.read_html('my_file.html')

for i, df in enumerate(df_list):
    print df
    df.to_csv('table {}.csv'.format(i))
```
Getting the html content directly from the web instead of from a file would only require a slight modification:
```
import requests

html = requests.get('my_url').content
df_list = pd.read_html(html)
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...