I\'m trying to scrape the data from the coins catalog.
There is one of the pages. I need to scrape this data into Dataframe
So far I have this code:
<Try:
import pandas as pd
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
table = soup.find('table', attrs={'class':'subs noBorders evenRows'})
table_rows = table.find_all('tr')
res = []
for tr in table_rows:
td = tr.find_all('td')
row = [tr.text.strip() for tr in td if tr.text.strip()]
if row:
res.append(row)
df = pd.DataFrame(res, columns=["Year", "Mintage", "Quality", "Price"])
print(df)
Output:
Year Mintage Quality Price
0 1882 108,000 UNC —
1 1883 786,000 UNC ~ $4.03
Try this
l = []
for tr in table_rows:
td = tr.find_all('td')
row = [tr.text for tr in td]
l.append(row)
pd.DataFrame(l, columns=["A", "B", ...])
Pandas already has a built-in method to convert the table on the web to a dataframe:
table = soup.find_all('table')
df = pd.read_html(str(table))[0]
Just a head's up... This part of Rakesh's code means that only HTML rows containing text will be included in the dataframe, as the rows don't get appended if row is an empty list:
if row:
res.append(row)
Problematic in my use case, where I wanted to compare row indexing for the HTML and dataframe tables later on. I just needed to change it to:
res.append(row)
Also, if a cell in the row is empty, it doesn't get included. This then messes up the columns. So I changed
row = [tr.text.strip() for tr in td if tr.text.strip()]
to
row = [d.text.strip() for d in td]
But, otherwise, it's working for me. Thanks :)