How to convert an HTML table to an array in python

前端 未结 3 1865
故里飘歌
故里飘歌 2020-12-24 03:23

I have an html document, and I want to pull the tables out of this document and return them as arrays. I\'m picturing 2 functions, one that finds all the html tables in a d

3条回答
  •  粉色の甜心
    2020-12-24 04:11

    Pandas can extract all of the tables in your html to a list of dataframes right out of the box, saving you from having to parse the page yourself (reinventing the wheel). A DataFrame is a powerful type of 2-dimensional array.

    I recommend continuing to work with the data via Pandas since it's a great tool, but you can also convert to other formats if you prefer (list, dictionary, csv file, etc.).

    Example

    """Extract all tables from an html file, printing and saving each to csv file."""
    
    import pandas as pd
    
    df_list = pd.read_html('my_file.html')
    
    for i, df in enumerate(df_list):
        print df
        df.to_csv('table {}.csv'.format(i))
    

    Getting the html content directly from the web instead of from a file would only require a slight modification:

    import requests
    
    html = requests.get('my_url').content
    df_list = pd.read_html(html)
    

提交回复
热议问题