add new field to form with rvest

南笙酒味 提交于 2019-12-10 17:49:34

问题


I'm trying to download [the full] dynamically expanded [holdings] table using rvest, but am getting an Unknown field names error.

s <- html_session("http://innovatoretfs.com/etf/?ticker=ffty")
f <- html_form(s)[[1]]
#the following line fails:
f.new <- set_values(f, `__EVENTTARGET` = "ctl00$BodyPlaceHolder$ViewHoldingsLinkButton")

##subsequent lines are not tested##
doc <- submit_form(s, f.new)
tabs <- xml_find_all(doc, "//table")
holdings <- html_table(tabs, fill = T, trim = T)[[5]]

I'm not great with HTML/HTTP but from what i can chase through, it seems to me that to expand the table requires a postback of the form with this new field value set

after inspecting the set_values function, it seems that it only allows existing fields to be assigned values.

is there any way to add a new field to a form under rvest? If not, is anyone ware of another package I could use to get this functionality?

[edited] to be very explicit that i need the full version of the dynamically expanded table and to add expected subsequent table extraction code


回答1:


DISGUSTING, BUT WORKS could probably be cleaned up, but will submit an issue to the project for a proper fix for add_values type functionality

getInnovatorHoldings <- function() {
    s <- html_session("http://innovatoretfs.com/etf/?ticker=ffty")
    f <- html_form(s)[[1]]
    f.new <- add_values(f,
                            `__EVENTTARGET` = "ctl00$BodyPlaceHolder$ViewHoldingsLinkButton",
                            `__EVENTARGUMENT` = "",
                            `submit` = NULL)

    s <- submit_form(s, f.new, "submit")
    doc <- read_html(s)
    tabs <- xml_find_all(doc, "//table")
    holdings <- html_table(tabs, fill = T, trim = T)[[5]]
    return(holdings)
}

add_values <- function(form, ...) {
    new_values <- list(...)
    no_match <- which(!names(new_values) %in% names(form$fields))
    for (n in no_match) {
        if (names(new_values[n]) == "submit") {
            form$fields[[names(new_values[n])]] <- new_input(name = names(new_values[n]), type = "submit", value = NULL)
        } else {
            form$fields[[names(new_values[n])]] <- new_input(name = names(new_values[n]), type = "hidden", value = new_values[n][[1]])
        }
    }
    return(form)
}

new_input <- function(name, type, value, checked = NULL, disabled = NULL, readonly = NULL, required = F) {
    return(
        structure(
            list(name = name,
                type = type,
                value = value,
                checked = checked,
                disabled = disabled,
                readonly = readonly,
                required = required
                ),
            class = "input"
        )
    )
}



回答2:


Answer: rvest

This solution works, but only returns the first 10 rows of the table:

library(tidyverse)
library(rvest)

ffty_url <- "http://innovatoretfs.com/etf/?ticker=ffty"

ffty_table <- ffty_url %>%
  read_html %>%
  html_table(fill = T) %>% 
  .[[5]]

Working on getting the full table, but that may not be possible using rvest because it is expandable. Honestly not sure.


Answer: RSelenium

You're going to have to install RSelenium and docker, and there are multiple tutorials on that. BUT the following code also only returns the first ten rows, which has me livid.

library(RSelenium)
library(rvest)

remDr <- remoteDriver(port = 4445L, remoteServerAddr = "localhost",
                  browserName = "chrome")
remDr$open()
remDr$navigate("http://innovatoretfs.com/etf/?ticker=ffty")
page <- read_html(remDr$getPageSource()[[1]])
table <- html_table(page, fill = TRUE, header = T)
table[[5]]

If anyone wants to expand on either sets of code, please...



来源:https://stackoverflow.com/questions/51352697/add-new-field-to-form-with-rvest

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!