Scrape tables into dataframe with BeautifulSoup

后端未结

关注

 4  865

日久生厌

I\'m trying to scrape the data from the coins catalog.

There is one of the pages. I need to scrape this data into Dataframe

So far I have this code:

相关标签:

4条回答

情书的邮戳

2020-12-13 21:20

Try:

import pandas as pd
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
table = soup.find('table', attrs={'class':'subs noBorders evenRows'})
table_rows = table.find_all('tr')

res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)


df = pd.DataFrame(res, columns=["Year", "Mintage", "Quality", "Price"])
print(df)

Output:

   Year  Mintage Quality    Price
0  1882  108,000     UNC        —
1  1883  786,000     UNC  ~ $4.03

0 讨论(0)

忘了有多久

2020-12-13 21:22

Try this

l = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    l.append(row)
pd.DataFrame(l, columns=["A", "B", ...])

0 讨论(0)

广开言路

2020-12-13 21:30
Pandas already has a built-in method to convert the table on the web to a dataframe:
```
table = soup.find_all('table')
df = pd.read_html(str(table))[0]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-12-13 21:30
Just a head's up... This part of Rakesh's code means that only HTML rows containing text will be included in the dataframe, as the rows don't get appended if row is an empty list:
```
if row:
    res.append(row)
```
Problematic in my use case, where I wanted to compare row indexing for the HTML and dataframe tables later on. I just needed to change it to:
```
res.append(row)
```
Also, if a cell in the row is empty, it doesn't get included. This then messes up the columns. So I changed
```
row = [tr.text.strip() for tr in td if tr.text.strip()]
```
to
```
row = [d.text.strip() for d in td]
```
But, otherwise, it's working for me. Thanks :)
0 讨论(0)
发布评论:

提交评论
- 加载中...