httr

Scraping password protected forum in r

这一生的挚爱 提交于 2019-11-30 03:59:21
I have a problem with logging in in my script. Despite all other good answers that I found on stackoverflow, none of the solutions worked for me. I am scraping a web forum for my PhD research, its URL is http://forum.axishistory.com . The webpage I want to scrape is the memberlist - a page that lists the links to all member profiles. One can only access the memberlist if logged in. If you try to access the memberlist without logging in, it shows you the log in form. The URL of the memberlist is this: http://forum.axishistory.com/memberlist.php . I tried the httr-package: library(httr) members

automating the login to the uk data service website in R with RCurl or httr

戏子无情 提交于 2019-11-30 02:23:19
I am in the process of writing a collection of freely-downloadable R scripts for http://asdfree.com/ to help people analyze the complex sample survey data hosted by the UK data service . In addition to providing lots of statistics tutorials for these data sets, I also want to automate the download and importation of this survey data. In order to do that, I need to figure out how to programmatically log into this UK data service website . I have tried lots of different configurations of RCurl and httr to log in, but I'm making a mistake somewhere and I'm stuck. I have tried inspecting the

SSL verification causes RCurl and httr to break - on a website that should be legit

让人想犯罪 __ 提交于 2019-11-29 18:11:28
问题 i'm trying to automate the login of the UK's data archive service. that website is obviously trustworthy. unfortunately, both RCurl and httr break at SSL verification. my web browser doesn't give any sort of warning. i can work around the issue by using ssl.verifypeer = FALSE in RCurl but i'd like to understand what's going on? # breaks library(httr) GET( "https://www.esds.ac.uk/secure/UKDSRegister_start.asp" ) # breaks library(RCurl) cert <- system.file("CurlSSL/cacert.pem", package = "RCurl

How to properly set cookies to get URL content using httr

我们两清 提交于 2019-11-29 11:57:59
I need to download information from web site that is protected using cookies. I pass this protection manually and then insert cookies to httr . Here is similar topic, but it does not solve my problem: ( Copying cookie for httr ) library(httr) url<-"http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ" cook<-"_SMIDA=9117a9eb136353bd6956651bd59acd37; __utmt=1; __utma=29983421.1729484844.1413489369.1413625619.1413627797.3; __utmb=29983421.7.10.1413627797; __utmc=29983421; __utmz=29983421.1413489369.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)" response <- GET(url, config(cookie=

Getting data in R as dataframe from web source

筅森魡賤 提交于 2019-11-29 11:16:49
I am trying to load some air pollution background data directly into R as a data.frame using the RCurl package. The website in question has 3 dropdown boxes to choose options before downloading the .csv file as shown in figure below: I am trying to select 3 values from the drop down box and download the data using "Download CSV" button directly into R as a data.frame. I want to download the different combinations of multiple years and multiple pollutants for a specific site. In other posts on StackOverflow I have come across getForm function from the RCurl package but I don't understand how to

how to download a large binary file with RCurl *after* server authentication

北战南征 提交于 2019-11-29 06:44:00
i originally asked this question about performing this task with the httr package, but i don't think it's possible using httr . so i've re-written my code to use RCurl instead -- but i'm still tripping up on something probably related to the writefunction .. but i really don't understand why. you should be able to reproduce my work by using the 32-bit version of R, so you hit memory limits if you read anything into RAM. i need a solution that downloads directly to the hard disk. to start, this code to works -- the zipped file is appropriately saved to the disk. library(RCurl) filename <-

R: Download image using rvest

大兔子大兔子 提交于 2019-11-29 03:50:25
问题 I'm attempting to download a png image from a secure site through R. To access the secure site I used Rvest which worked well. So far I've extracted the URL for the png image. How can I download the image of this link using rvest? Functions outside of the rvest function return errors due to not having permission. Current attempts library(rvest) uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" session <- html_session("https:/

Oauth authentification to Fitbit using httr

北城以北 提交于 2019-11-29 02:31:18
I'm trying to connect to the fitbit api using the httr library . Using the examples provided, I came up with the following code: library(httr) key <- '<edited>' secret <- '<edited>' tokenURL <- 'http://api.fitbit.com/oauth/request_token' accessTokenURL <- 'http://api.fitbit.com/oauth/access_token' authorizeURL <- 'https://www.fitbit.com/oauth/authorize' fbr <- oauth_app('fitbitR',key,secret) fitbit <- oauth_endpoint(tokenURL,authorizeURL,accessTokenURL) token <- oauth1.0_token(fitbit,fbr) sig <- sign_oauth1.0(fbr, token=token$oauth_token, token_secret=token$oauth_token_secret ) I get the

Scraping password protected forum in r

☆樱花仙子☆ 提交于 2019-11-29 01:38:50
问题 I have a problem with logging in in my script. Despite all other good answers that I found on stackoverflow, none of the solutions worked for me. I am scraping a web forum for my PhD research, its URL is http://forum.axishistory.com. The webpage I want to scrape is the memberlist - a page that lists the links to all member profiles. One can only access the memberlist if logged in. If you try to access the memberlist without logging in, it shows you the log in form. The URL of the memberlist

Using R to “click” a download file button on a webpage

有些话、适合烂在心里 提交于 2019-11-28 11:22:06
I am attempting to use this webpage http://volcano.si.edu/search_eruption.cfm to scrape data. There are two drop-down boxes that ask for filters of the data. I do not need filtered data, so I leave those blank and continue on to the next page by clicking " Search Eruptions ". What I have noticed, though, is that the resulting table only includes a small amount of columns (only 5) compared to the total amount of columns (total of 24) it should have. However, all 24 columns will be there if you click the " Download Results to Excel " button and open the downloaded file. This is what I need. So,