Missing elements when using `read_html` using `rvest` in R

二次信任 提交于 2019-12-04 20:23:27

The table is built dynamically from data in JavaScript variables on the page itself. Either use RSelenium to grab the text of the page after it's rendered and pass the page into rvest OR grab a treasure trove of all the data by using V8:

library(rvest)
library(V8)

URL <- "http://projects.fivethirtyeight.com/2016-election-forecast/washington/#now"

pg <- read_html(URL)

js <- html_nodes(pg, xpath=".//script[contains(., 'race.model')]") %>%  html_text()

ctx <- v8()
ctx$eval(JS(js))

race <- ctx$get("race", simplifyVector=FALSE)

str(race) ## output too large to paste here

If they ever change the formatting of the JavaScript (it's an automated process so it's unlikely but you never know) then the RSelenium approach will be better provided they don't change the format of the table structure (again, unlikely, but you never know).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!