rvest

unable to install rvest package

对着背影说爱祢 提交于 2019-12-04 03:33:50
I need to install rvest package for R version 3.1.2 (2014-10-31) I get these errors: checking whether the C++ compiler supports the long long type... no *** stringi cannot be built. Upgrade your C++ compiler's settings ERROR: configuration failed for package ‘stringi’ * removing ‘/usr/local/lib64/R/library/stringi’ ERROR: dependency ‘stringi’ is not available for package ‘stringr’ * removing ‘/usr/local/lib64/R/library/stringr’ ERROR: dependency ‘stringr’ is not available for package ‘httr’ * removing ‘/usr/local/lib64/R/library/httr’ ERROR: dependency ‘stringr’ is not available for package

Google translate via web scraping r

我的梦境 提交于 2019-12-04 02:11:35
问题 I have a list of 1000 text in Russian language and want to convert it to English in R. I know there are some R packages for google translate but that requires API. And getting google API is paid now. In Excel VBA, I have a macro which visits google translate website and converts it. See the URL and parameters below - getParam = "Прием (осмотр, консультация) врача-инфекциониста первичный" translateFrom = "ru" translateTo = "en" URL = "https://translate.google.pl/m?hl=" & translateFrom & "&sl="

R: use rvest (or httr) to log in to a site requiring cookies

喜你入骨 提交于 2019-12-04 01:49:14
问题 I'm trying to automate the shibboleth-based login process for the UK Data Service in R. One can sign up for an account to login here. A previous attempt to automate this process is found in this question, automating the login to the uk data service website in R with RCurl or httr. I thought the excellent answers to this question, how to authenticate a shibboleth multi-hostname website with httr in R, were going to get me there, but I've run into a wall. And, yes, RSelenium provides an

Web scraping of image

我是研究僧i 提交于 2019-12-03 20:42:17
I am a beginner. I created a small code to web scraping with rvest. I found a very convenient code %>% html_node ()%>% html_text ()%>% as.numeric () , but I was not able to correctly change the code for scraping url of image. My code for web scraping url of image: UrlPage <- html ("http://eyeonhousing.org/2012/11/gdp-growth-in-the-third-quarter-improved-but-still-slow/") img <- UrlPage%>% html_node (". wp-image-5984")%>% html_attrs () Result: class "Aligncenter size-full wp-image-5984" `enter code here`title "Blog gdp 2012_10_1" alt '" src "Http://eyeonhousing.files.wordpress.com/2012/11/blog

Yahoo login using rvest

感情迁移 提交于 2019-12-03 20:39:02
Recently, Yahoo changed their authentication mechanism to a two step one. So now, when I login to a yahoo site, I put in my username, and then it asks me to open my yahoo mobile app to give it a code. Alternatively, you can have it email or text you some other way around this. The result of this is that code that used to work to programatically login to Yahoo sites no longer works. This code just redirects to the login form. I've tried with and without a useragent string and with and without the countrycode=1 in the form values. I'm fine with entering a code after looking at my mobile app, but

rvest how to select a specific css node by id

不问归期 提交于 2019-12-03 17:49:55
问题 I'm trying to use the rvest package to scrape data from a web page. In a simple format, the html code looks like this: <div class="style"> <input id="a" value="123"> <input id="b"> </div> I want to get the value 123 from the first input. I tried the following R code: library(rvest) url<-"xxx" output<-html_nodes(url, ".style input") This will return a list of input tags: [[1]] <input id="a" value="123"> [[2]] <input id="b"> Next I tried using html_node to reference the first input tag by id:

Iterating rvest scrape function gives: “Error in open.connection(x, ”rb“) : Timeout was reached”

空扰寡人 提交于 2019-12-03 14:06:38
问题 I'm scraping this website using the "rvest"-package. When I iterate my function too many times I get "Error in open.connection(x, "rb") : Timeout was reached". I have searched for similar questions but the answers seems to lead to dead ends. I have a suspicion that it is server side and the website has a build-in restriction on how many times I can visit the page. How do investigate this hypothesis? The code: I have the links to the underlying web pages and want to construct a data frame with

Web scraping in R?

风流意气都作罢 提交于 2019-12-03 09:37:24
I would like to web scrape this web site In particular I would like to take the information that it is in that table: Please note that I choose a specific date on the upper right corner. By following this guide I wrote the following code library(rvest) url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/' webpage_nba <- read_html(url_nba) #Using CSS selectors to scrap the rankings section data_nba <- html_nodes(webpage_nba,'#standings-table') #Converting the ranking data to text data_nba <- html_text(data_nba) write.csv(data_nba,"web scraping test.csv") From my understanding

Scraping linked HTML webpages by looping the rvest::follow_link() function

前提是你 提交于 2019-12-03 08:45:43
How can I loop the rvest::follow_link() function to scrape linked webpages? Use Case: Identify all Lego Movie cast members Follow all Lego Movie cast member links Grab a table of each movie (+ year) for all cast members The required selectors I need are below: library(rvest) lego_movie <- html("http://www.imdb.com/title/tt1490017/") lego_movie <- lego_movie %>% html_nodes(".itemprop , .character a") %>% html_text() # follow cast links (".itemprop .itemprop") # grab tables of all movies and dates for each cast member (".year_column , b a") Desired Output: castMember movie year Will Arnett Lego

R Change IP Address programmatically

纵饮孤独 提交于 2019-12-03 08:02:04
问题 Currently changing user_agent by passing different strings to the html_session() method. Is there also a way to change your IP address on a timer when scraping a website? 回答1: You can use a proxy (which changes your ip) via use_proxy as follows: html_session("you-url", use_proxy("proxy-ip", port)) For more details see: ?httr::use_proxy To check if it is working you can do the following: require(httr) content(GET("https://ifconfig.co/json"), "parsed") content(GET("https://ifconfig.co/json",