发表新帖

发表新帖

How to extract tables from websites in Python

后端未结

关注

 6  861

无人及你 2020-12-04 18:18

Here,

http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500

There is a table. My goal is to

6条回答

甜味超标 (楼主)

2020-12-04 19:05
Pandas can do this right out of the box, saving you from having to parse the html yourself. to_html() extracts all tables from your html and puts them in a list of dataframes. to_csv() can be used to convert each dataframe to a csv file. For the web page in your example, the relevant table is the last one, which is why I used df_list[-1] in the code below.
```
import requests
import pandas as pd

url = 'http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv('my data.csv')
```
It's simple enough to do in one line, if you prefer:
```
pd.read_html(requests.get().content)[-1].to_csv()
```
P.S. Just make sure you have lxml, html5lib, and BeautifulSoup4 packages installed in advance.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题