Webscraping soccer data returns nothing

*爱你&永不变心* 提交于 2019-12-11 17:54:25

问题


I would like to scrape the match result table from the website https://www.whoscored.com/Regions/247/Tournaments/36/Seasons/5967/Stages/15737/Fixtures/International-FIFA-World-Cup-2018

I m using rvest package with following code:

library(rvest)

url.tournament <- "https://www.whoscored.com/Regions/247/Tournaments/36/Seasons/5967/Stages/15737/Fixtures/International-FIFA-World-Cup-2018"
df.tournament <- read_html(url.tournament) %>%
                  html_nodes(xpath='//*[@id="tournament-fixture-wrapper"]') %>%
                  html_nodes("table")
                  html_table()

while no element is extracted.


回答1:


Looking at the website’s source code you can see that the table doesn’t actually exist in the HTML source — it’s dynamically generated using JavaScript. That’s why your XPath query returns an empty <div>.

You consequently can’t rely on {rvest} in this case, you need to use a dynamic scraper such as {RSelenium}, which can interpret JavaScript.



来源:https://stackoverflow.com/questions/51025719/webscraping-soccer-data-returns-nothing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!