问题
I'm trying to download [the full] dynamically expanded [holdings] table using rvest, but am getting an Unknown field names
error.
s <- html_session("http://innovatoretfs.com/etf/?ticker=ffty")
f <- html_form(s)[[1]]
#the following line fails:
f.new <- set_values(f, `__EVENTTARGET` = "ctl00$BodyPlaceHolder$ViewHoldingsLinkButton")
##subsequent lines are not tested##
doc <- submit_form(s, f.new)
tabs <- xml_find_all(doc, "//table")
holdings <- html_table(tabs, fill = T, trim = T)[[5]]
I'm not great with HTML/HTTP but from what i can chase through, it seems to me that to expand the table requires a postback of the form with this new field value set
after inspecting the set_values
function, it seems that it only allows existing fields to be assigned values.
is there any way to add a new field to a form under rvest
? If not, is anyone ware of another package I could use to get this functionality?
[edited] to be very explicit that i need the full version of the dynamically expanded table and to add expected subsequent table extraction code
回答1:
DISGUSTING, BUT WORKS could probably be cleaned up, but will submit an issue to the project for a proper fix for add_values type functionality
getInnovatorHoldings <- function() {
s <- html_session("http://innovatoretfs.com/etf/?ticker=ffty")
f <- html_form(s)[[1]]
f.new <- add_values(f,
`__EVENTTARGET` = "ctl00$BodyPlaceHolder$ViewHoldingsLinkButton",
`__EVENTARGUMENT` = "",
`submit` = NULL)
s <- submit_form(s, f.new, "submit")
doc <- read_html(s)
tabs <- xml_find_all(doc, "//table")
holdings <- html_table(tabs, fill = T, trim = T)[[5]]
return(holdings)
}
add_values <- function(form, ...) {
new_values <- list(...)
no_match <- which(!names(new_values) %in% names(form$fields))
for (n in no_match) {
if (names(new_values[n]) == "submit") {
form$fields[[names(new_values[n])]] <- new_input(name = names(new_values[n]), type = "submit", value = NULL)
} else {
form$fields[[names(new_values[n])]] <- new_input(name = names(new_values[n]), type = "hidden", value = new_values[n][[1]])
}
}
return(form)
}
new_input <- function(name, type, value, checked = NULL, disabled = NULL, readonly = NULL, required = F) {
return(
structure(
list(name = name,
type = type,
value = value,
checked = checked,
disabled = disabled,
readonly = readonly,
required = required
),
class = "input"
)
)
}
回答2:
Answer: rvest
This solution works, but only returns the first 10 rows of the table:
library(tidyverse)
library(rvest)
ffty_url <- "http://innovatoretfs.com/etf/?ticker=ffty"
ffty_table <- ffty_url %>%
read_html %>%
html_table(fill = T) %>%
.[[5]]
Working on getting the full table, but that may not be possible using rvest
because it is expandable. Honestly not sure.
Answer: RSelenium
You're going to have to install RSelenium
and docker, and there are multiple tutorials on that. BUT the following code also only returns the first ten rows, which has me livid.
library(RSelenium)
library(rvest)
remDr <- remoteDriver(port = 4445L, remoteServerAddr = "localhost",
browserName = "chrome")
remDr$open()
remDr$navigate("http://innovatoretfs.com/etf/?ticker=ffty")
page <- read_html(remDr$getPageSource()[[1]])
table <- html_table(page, fill = TRUE, header = T)
table[[5]]
If anyone wants to expand on either sets of code, please...
来源:https://stackoverflow.com/questions/51352697/add-new-field-to-form-with-rvest