问题
The following is my code:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
stats_page = requests.get('https://www.sports-reference.com/cbb/schools/loyola-il/2020.html')
content = stats_page.content
soup = BeautifulSoup(content, 'html.parser')
table = soup.find(name='table', attrs={'id':'per_poss'})
html_str = str(table)
df = pd.read_html(html_str)[0]
df.head()
And I get the error: ValueError: No tables found.
However, when I swap attrs={'id':'per_poss'}
with a different table id like attrs={'id':'per_game'}
I get an output.
I am not familiar with html and scraping but I noticed in the tables that work, this is the html: <table class="sortable stats_table now_sortable is_sorted" id="per_game" data-cols-to-freeze="2">
And in the tables that don't work, this is the html: <table class="sortable stats_table now_sortable sticky_table re2 le1" id="totals" data-cols-to-freeze="2">
It seems the table classes are different and I am not sure if that is causing this problem and how to fix it if so.
Thank you!
回答1:
This is happening because the table is within HTML comments <!-- .... -->
.
You can extract the table checking if the tags are of the type Comment:
import pandas as pd
import requests
from bs4 import BeautifulSoup, Comment
URL = "https://www.sports-reference.com/cbb/schools/loyola-il/2020.html"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
comments = soup.find_all(text=lambda t: isinstance(t, Comment))
comment_soup = BeautifulSoup(str(comments), "html.parser")
table = comment_soup.select("#div_per_poss")[0]
df = pd.read_html(str(comment_soup))
print(df)
Output:
[ Rk Player G GS MP FG ... AST STL BLK TOV PF PTS
0 1.0 Cameron Krutwig 32 32.0 1001 201 ... 133 39 20 81 45 482
1 2.0 Tate Hall 32 32.0 1052 141 ... 70 47 3 57 56 406
2 3.0 Marquise Kennedy 32 6.0 671 110 ... 43 38 9 37 72 294
3 4.0 Lucas Williamson 32 32.0 967 99 ... 53 49 9 57 64 287
4 5.0 Keith Clemons 24 24.0 758 78 ... 47 29 1 32 50 249
5 6.0 Aher Uguak 32 31.0 768 62 ... 61 15 3 59 56 181
6 7.0 Jalon Pipkins 30 1.0 392 34 ... 12 10 1 17 15 101
7 8.0 Paxson Wojcik 30 1.0 327 25 ... 18 14 0 14 23 61
...
...
来源:https://stackoverflow.com/questions/64827590/how-do-you-scrape-a-table-when-the-table-is-unable-to-return-values-beautifuls