rcurl | 易学教程

automating the login to the uk data service website in R with RCurl or httr

阅读更多关于 automating the login to the uk data service website in R with RCurl or httr

I am in the process of writing a collection of freely-downloadable R scripts for http://asdfree.com/ to help people analyze the complex sample survey data hosted by the UK data service . In addition to providing lots of statistics tutorials for these data sets, I also want to automate the download and importation of this survey data. In order to do that, I need to figure out how to programmatically log into this UK data service website . I have tried lots of different configurations of RCurl and httr to log in, but I'm making a mistake somewhere and I'm stuck. I have tried inspecting the

Downloading large files with R/RCurl efficiently

阅读更多关于 Downloading large files with R/RCurl efficiently

I see that many examples for downloading binary files with RCurl are like such: library("RCurl") curl = getCurlHandle() bfile=getBinaryURL ( "http://www.example.com/bfile.zip", curl= curl, progressfunction = function(down, up) {print(down)}, noprogress = FALSE ) writeBin(bfile, "bfile.zip") rm(curl, bfile) If the download is very large, I suppose it would be better writing it concurrently to the storage medium, instead of fetching all in memory. In RCurl documentation there are some examples to get files by chunks and manipulate them as they are downloaded, but they seem all referred to text

How to isolate a single element from a scraped web page in R

阅读更多关于 How to isolate a single element from a scraped web page in R

I want to use R to scrape this page: ( http://www.fifa.com/worldcup/archive/germany2006/results/matches/match=97410001/report.html ) and others, to get the goal scorers and times. So far, this is what I've got: require(RCurl) require(XML) theURL <-"http://www.fifa.com/worldcup/archive/germany2006/results/matches/match=97410001/report.html" webpage <- getURL(theURL, header=FALSE, verbose=TRUE) webpagecont <- readLines(tc <- textConnection(webpage)); close(tc) pagetree <- htmlTreeParse(webpagecont, error=function(...){}, useInternalNodes = TRUE) and the pagetree object now contains a pointer to

SSL verification causes RCurl and httr to break - on a website that should be legit

阅读更多关于 SSL verification causes RCurl and httr to break - on a website that should be legit

问题 i'm trying to automate the login of the UK's data archive service. that website is obviously trustworthy. unfortunately, both RCurl and httr break at SSL verification. my web browser doesn't give any sort of warning. i can work around the issue by using ssl.verifypeer = FALSE in RCurl but i'd like to understand what's going on? # breaks library(httr) GET( "https://www.esds.ac.uk/secure/UKDSRegister_start.asp" ) # breaks library(RCurl) cert <- system.file("CurlSSL/cacert.pem", package = "RCurl

Using R to download SAS file from ftp-server

阅读更多关于 Using R to download SAS file from ftp-server

I am attempting to download some files onto my local from an ftp-server. I have had success using the following method to move .txt and .csv files from the server but not the .sas7bdat files that I need. protocol <- "sftp" server <- "ServerName" userpwd <- "User:Pass" tsfrFilename <- "/filepath/file.sas7bdat" ouptFilename <- "out.sas7bdat" # Run # ## Download Data url <- paste0(protocol, "://", server, tsfrFilename) data <- getURL(url = url, userpwd=userpwd) ## Create File fconn <- file(ouptFilename) writeLines(data, fconn) close(fconn) When I run the getURL command, however, I am met with the

How do I POST a JSON formatted request to GET JSON data from a URL in R into the data.frame in a less verbose manner?

阅读更多关于 How do I POST a JSON formatted request to GET JSON data from a URL in R into the data.frame in a less verbose manner?

I have written the following code in R to start using a data request API. It's a normal web service JSON API. library(RJSONIO) library(RCurl) library(httr) r <- POST("http://api.scb.se/OV0104/v1/doris/sv/ssd/START/PR/PR0101/PR0101A/KPIFastM2", body = '{ "query": [], "response": { "format": "json" } }') stop_for_status(r) a<-content(r, "text", "application/json", encoding="UTF-8") cat(a, file = "test.json") x<-fromJSON(file("test.json", "r")) mydf<-do.call(rbind, lapply(x$data, data.frame)) colnames(mydf)<-c("YearMonth", "CPI") Basically it initialized a get reuest for the URL using httr and

Scraping data from tables on multiple web pages in R (football players)

阅读更多关于 Scraping data from tables on multiple web pages in R (football players)

问题 I'm working on a project for school where I need to collect the career statistics for individual NCAA football players. The data for each player is in this format. http://www.sports-reference.com/cfb/players/ryan-aplin-1.html I cannot find an aggregate of all players so I need to go page by page and pull out the bottom row of each passing scoring Rushing & receiving etc. html table Each player is catagorized by their last name with links to each alphabet going here. http://www.sports

how to download a large binary file with RCurl after server authentication

阅读更多关于 how to download a large binary file with RCurl *after* server authentication

i originally asked this question about performing this task with the httr package, but i don't think it's possible using httr . so i've re-written my code to use RCurl instead -- but i'm still tripping up on something probably related to the writefunction .. but i really don't understand why. you should be able to reproduce my work by using the 32-bit version of R, so you hit memory limits if you read anything into RAM. i need a solution that downloads directly to the hard disk. to start, this code to works -- the zipped file is appropriately saved to the disk. library(RCurl) filename <-

RCurl: HTTP Authentication When Site Responds With HTTP 401 Code Without WWW-Authenticate

阅读更多关于 RCurl: HTTP Authentication When Site Responds With HTTP 401 Code Without WWW-Authenticate

I'm implementing an R wrapper around PiCloud's REST API using the RCurl package to make HTTP(S) requests to the API server. The API uses Basic HTTP authentication to verify that users have sufficient permissions. The PiCloud documentation gives an example of using the api and authenticating with curl: $ curl -u 'key:secret_key' https://api.picloud.com/job/?jids=12 This works perfectly. Translating this to an equivalent RCurl's command: getURL("https://api.picloud.com/job/?jids=12", userpwd="key:secret") Executing this function I receive the following error message: [1] "{\"error\": {\"msg\": \

R: Download image using rvest

阅读更多关于 R: Download image using rvest

问题 I'm attempting to download a png image from a secure site through R. To access the secure site I used Rvest which worked well. So far I've extracted the URL for the png image. How can I download the image of this link using rvest? Functions outside of the rvest function return errors due to not having permission. Current attempts library(rvest) uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" session <- html_session("https:/