How to extract tables from websites in Python

后端 未结 6 861
无人及你
无人及你 2020-12-04 18:18

Here,

http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500

There is a table. My goal is to

6条回答
  •  甜味超标
    2020-12-04 19:05

    Pandas can do this right out of the box, saving you from having to parse the html yourself. to_html() extracts all tables from your html and puts them in a list of dataframes. to_csv() can be used to convert each dataframe to a csv file. For the web page in your example, the relevant table is the last one, which is why I used df_list[-1] in the code below.

    import requests
    import pandas as pd
    
    url = 'http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500'
    html = requests.get(url).content
    df_list = pd.read_html(html)
    df = df_list[-1]
    print(df)
    df.to_csv('my data.csv')
    

    It's simple enough to do in one line, if you prefer:

    pd.read_html(requests.get().content)[-1].to_csv()
    

    P.S. Just make sure you have lxml, html5lib, and BeautifulSoup4 packages installed in advance.

提交回复
热议问题