问题
I am trying to learn how to do some scrapping using rvest package. I´m using this url to load the information, and I am trying to get the information of the table marked as "advanced" in the URL:
When I try to load the information, all I´m able to get is the first table. I mean, when I inspect using google chrome I see that the numbers in the table are marked as class="right". So this is what I tried:
library(rvest)
library(stringr)
url = url("https://www.basketball-reference.com/players/l/leonaka01.html")
read = html_nodes(read_html(url),
'.right')
read2 = str_replace_all(html_text(read),
"[\r\n\t]" , "")
What I see is that read is a list of 351 values. Ok, that is he detected 351 values marked as right. If I get the last one, read2[351], I see "29.3" which is the last value of the first table.
So... how can I get the information about the other tables? I have never told R to get the first table, I supposed that I´d get all the information of all the tables and my next step would be to filter the "Advanced" table values somehow.
Regards
回答1:
The "Advanced" table is hidden under comments, hence it isn't directly accessible. We can get all the comments together using xpath
and then parse the table from it.
library(rvest)
url = "https://www.basketball-reference.com/players/l/leonaka01.html"
url %>%
read_html %>%
html_nodes(xpath = '//comment()') %>%
html_text() %>%
toString() %>%
read_html() %>%
html_node('table#advanced') %>%
html_table()
# Season Age Tm Lg Pos G MP PER TS% 3PAr FTr ORB% ...
#1 2011-12 20 SAS NBA SF 64 1534 16.6 0.573 0.270 0.218 7.9 ...
#2 2012-13 21 SAS NBA SF 58 1810 16.4 0.592 0.331 0.240 4.3 ...
#3 2013-14 22 SAS NBA SF 66 1923 19.4 0.602 0.282 0.195 4.6 ...
#4 2014-15 23 SAS NBA SF 64 2033 22.0 0.567 0.234 0.307 4.8 ...
#5 2015-16 24 SAS NBA SF 72 2380 26.0 0.616 0.267 0.306 4.7 ...
#6 2016-17 25 SAS NBA SF 74 2474 27.6 0.610 0.295 0.406 3.7 ...
#7 2017-18 26 SAS NBA SF 9 210 26.0 0.572 0.315 0.342 3.1 ...
#8 2018-19 27 TOR NBA SF 60 2040 25.8 0.606 0.267 0.377 4.2 ...
#9 2019-20 28 LAC NBA SF 6 183 35.1 0.572 0.230 0.319 5.5 ...
#10 Career NA NBA 473 14587 22.8 0.599 0.276 0.318 4.8 ...
#11 NA NA NA NA NA NA NA NA ...
#12 7 seasons NA SAS NBA 407 12364 22.1 0.597 0.279 0.305 4.8 ...
#13 1 season NA TOR NBA 60 2040 25.8 0.606 0.267 0.377 4.2 ...
#14 1 season NA LAC NBA 6 183 35.1 0.572 0.230 0.319 5.5 ...
来源:https://stackoverflow.com/questions/58725125/scrape-a-url-with-several-tables-with-rvest