Scrape tables into dataframe with BeautifulSoup

后端 未结 4 865
日久生厌
日久生厌 2020-12-13 21:03

I\'m trying to scrape the data from the coins catalog.

There is one of the pages. I need to scrape this data into Dataframe

So far I have this code:

<
相关标签:
4条回答
  • 2020-12-13 21:20

    Try:

    import pandas as pd
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html, "html.parser")
    table = soup.find('table', attrs={'class':'subs noBorders evenRows'})
    table_rows = table.find_all('tr')
    
    res = []
    for tr in table_rows:
        td = tr.find_all('td')
        row = [tr.text.strip() for tr in td if tr.text.strip()]
        if row:
            res.append(row)
    
    
    df = pd.DataFrame(res, columns=["Year", "Mintage", "Quality", "Price"])
    print(df)
    

    Output:

       Year  Mintage Quality    Price
    0  1882  108,000     UNC        —
    1  1883  786,000     UNC  ~ $4.03
    
    0 讨论(0)
  • 2020-12-13 21:22

    Try this

    l = []
    for tr in table_rows:
        td = tr.find_all('td')
        row = [tr.text for tr in td]
        l.append(row)
    pd.DataFrame(l, columns=["A", "B", ...])
    
    0 讨论(0)
  • 2020-12-13 21:30

    Pandas already has a built-in method to convert the table on the web to a dataframe:

    table = soup.find_all('table')
    df = pd.read_html(str(table))[0]
    
    0 讨论(0)
  • 2020-12-13 21:30

    Just a head's up... This part of Rakesh's code means that only HTML rows containing text will be included in the dataframe, as the rows don't get appended if row is an empty list:

    if row:
        res.append(row)
    

    Problematic in my use case, where I wanted to compare row indexing for the HTML and dataframe tables later on. I just needed to change it to:

    res.append(row)
    

    Also, if a cell in the row is empty, it doesn't get included. This then messes up the columns. So I changed

    row = [tr.text.strip() for tr in td if tr.text.strip()]
    

    to

    row = [d.text.strip() for d in td]
    

    But, otherwise, it's working for me. Thanks :)

    0 讨论(0)
提交回复
热议问题