rcurl

R Disparity between browser and GET / getURL

心已入冬 提交于 2019-11-30 22:35:44
I'm trying to download the content from a page and I'm finding that the response data is either malformed or incomplete, as if GET or getURL are pulling before those data are loaded. library(httr) library(RCurl) url <- "https://www.vanguardcanada.ca/individual/etfs/etfs.htm" d1 <- GET(url) # This shows a lot of {{ moustache style }} code that's not filled d2 <- getURL(url) # This shows "" as if it didn't get anything I'm not sure how to proceed. My goal is to get the numbers associated with the links that show in the browser: https://www.vanguardcanada.ca/individual/etfs/etfs-detail-overview

RCurl default proxy settings

二次信任 提交于 2019-11-30 21:09:19
问题 I'm working here behind proxy so I need to configure my connection. It works well with defining options list and calling getURL: opts <- list( proxy = "http://****", proxyusername = "****", proxypassword = "*****", proxyport = **** ) getURL("http://stackoverflow.com", .opts = opts) I'd like to set this options as defaults, but still cannot find any working solutions. Could you advise anything? Thank you. 回答1: I regret to hurry up posting question. The solution was really at the hand, at RCurl

How to capture RCurl verbose output

别来无恙 提交于 2019-11-30 20:11:09
I have the following request library(RCurl) res=getURL("http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=RCurl&btnG=Search", .opts=list(verbose = TRUE) ) and would like to capture the verbose output of the call (i.e., what is printed in red in the R console). I thought that the output lines are messages and are therefore printed to stderr() . The following works for messages sink(textConnection("test","w"),type="message") message("test message") sink(stderr(),type="message") test #[1] "test message" but not if I replace message("test message") by the RCurl request res=getURL(.....) as

R Disparity between browser and GET / getURL

时光怂恿深爱的人放手 提交于 2019-11-30 17:40:01
问题 I'm trying to download the content from a page and I'm finding that the response data is either malformed or incomplete, as if GET or getURL are pulling before those data are loaded. library(httr) library(RCurl) url <- "https://www.vanguardcanada.ca/individual/etfs/etfs.htm" d1 <- GET(url) # This shows a lot of {{ moustache style }} code that's not filled d2 <- getURL(url) # This shows "" as if it didn't get anything I'm not sure how to proceed. My goal is to get the numbers associated with

SSL verification causes RCurl and httr to break - on a website that should be legit

╄→尐↘猪︶ㄣ 提交于 2019-11-30 13:08:52
i'm trying to automate the login of the UK's data archive service. that website is obviously trustworthy. unfortunately, both RCurl and httr break at SSL verification. my web browser doesn't give any sort of warning. i can work around the issue by using ssl.verifypeer = FALSE in RCurl but i'd like to understand what's going on? # breaks library(httr) GET( "https://www.esds.ac.uk/secure/UKDSRegister_start.asp" ) # breaks library(RCurl) cert <- system.file("CurlSSL/cacert.pem", package = "RCurl") getURL("https://www.esds.ac.uk/secure/UKDSRegister_start.asp",cainfo = cert) # works library(RCurl)

R - posting a login form using RCurl

你离开我真会死。 提交于 2019-11-30 07:44:25
I am new to using R to post forms and then download data off the web. I have a question that is probably very easy for someone out there to spot what I am doing wrong, so I appreciate your patience. I have a Win7 PC and Firefox 23.x is my typical browser. I am trying to post the main form that shows up on http://www.aplia.com/ I have the following R script: your.username <- 'username' your.password <- 'password' setwd( "C:/Users/Desktop/Aplia/data" ) require(SAScii) require(RCurl) require(XML) agent="Firefox/23.0" options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem",

Scraping data from tables on multiple web pages in R (football players)

半世苍凉 提交于 2019-11-30 07:34:17
I'm working on a project for school where I need to collect the career statistics for individual NCAA football players. The data for each player is in this format. http://www.sports-reference.com/cfb/players/ryan-aplin-1.html I cannot find an aggregate of all players so I need to go page by page and pull out the bottom row of each passing scoring Rushing & receiving etc. html table Each player is catagorized by their last name with links to each alphabet going here. http://www.sports-reference.com/cfb/players/ For instance, each player with the last name A is found here. http://www.sports

R: Download image using rvest

瘦欲@ 提交于 2019-11-30 07:31:47
I'm attempting to download a png image from a secure site through R. To access the secure site I used Rvest which worked well. So far I've extracted the URL for the png image. How can I download the image of this link using rvest? Functions outside of the rvest function return errors due to not having permission. Current attempts library(rvest) uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" session <- html_session("https://url.png", user_agent(uastring)) form <- html_form(session)[[1]] form <- set_values(form, username = "***"

How can I POST a simple HTML form in R?

自古美人都是妖i 提交于 2019-11-30 04:20:57
I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds. I have unsuccessfully tried to use two different R packages to submit a request to the

Scraping password protected forum in r

这一生的挚爱 提交于 2019-11-30 03:59:21
I have a problem with logging in in my script. Despite all other good answers that I found on stackoverflow, none of the solutions worked for me. I am scraping a web forum for my PhD research, its URL is http://forum.axishistory.com . The webpage I want to scrape is the memberlist - a page that lists the links to all member profiles. One can only access the memberlist if logged in. If you try to access the memberlist without logging in, it shows you the log in form. The URL of the memberlist is this: http://forum.axishistory.com/memberlist.php . I tried the httr-package: library(httr) members