rvest | 易学教程

unable to install rvest package

阅读更多关于 unable to install rvest package

I need to install rvest package for R version 3.1.2 (2014-10-31) I get these errors: checking whether the C++ compiler supports the long long type... no *** stringi cannot be built. Upgrade your C++ compiler's settings ERROR: configuration failed for package ‘stringi’ * removing ‘/usr/local/lib64/R/library/stringi’ ERROR: dependency ‘stringi’ is not available for package ‘stringr’ * removing ‘/usr/local/lib64/R/library/stringr’ ERROR: dependency ‘stringr’ is not available for package ‘httr’ * removing ‘/usr/local/lib64/R/library/httr’ ERROR: dependency ‘stringr’ is not available for package

Google translate via web scraping r

阅读更多关于 Google translate via web scraping r

问题 I have a list of 1000 text in Russian language and want to convert it to English in R. I know there are some R packages for google translate but that requires API. And getting google API is paid now. In Excel VBA, I have a macro which visits google translate website and converts it. See the URL and parameters below - getParam = "Прием (осмотр, консультация) врача-инфекциониста первичный" translateFrom = "ru" translateTo = "en" URL = "https://translate.google.pl/m?hl=" & translateFrom & "&sl="

R: use rvest (or httr) to log in to a site requiring cookies

阅读更多关于 R: use rvest (or httr) to log in to a site requiring cookies

问题 I'm trying to automate the shibboleth-based login process for the UK Data Service in R. One can sign up for an account to login here. A previous attempt to automate this process is found in this question, automating the login to the uk data service website in R with RCurl or httr. I thought the excellent answers to this question, how to authenticate a shibboleth multi-hostname website with httr in R, were going to get me there, but I've run into a wall. And, yes, RSelenium provides an

Web scraping of image

阅读更多关于 Web scraping of image

I am a beginner. I created a small code to web scraping with rvest. I found a very convenient code %>% html_node ()%>% html_text ()%>% as.numeric () , but I was not able to correctly change the code for scraping url of image. My code for web scraping url of image: UrlPage <- html ("http://eyeonhousing.org/2012/11/gdp-growth-in-the-third-quarter-improved-but-still-slow/") img <- UrlPage%>% html_node (". wp-image-5984")%>% html_attrs () Result: class "Aligncenter size-full wp-image-5984" `enter code here`title "Blog gdp 2012_10_1" alt '" src "Http://eyeonhousing.files.wordpress.com/2012/11/blog

Yahoo login using rvest

阅读更多关于 Yahoo login using rvest

Recently, Yahoo changed their authentication mechanism to a two step one. So now, when I login to a yahoo site, I put in my username, and then it asks me to open my yahoo mobile app to give it a code. Alternatively, you can have it email or text you some other way around this. The result of this is that code that used to work to programatically login to Yahoo sites no longer works. This code just redirects to the login form. I've tried with and without a useragent string and with and without the countrycode=1 in the form values. I'm fine with entering a code after looking at my mobile app, but

rvest how to select a specific css node by id

阅读更多关于 rvest how to select a specific css node by id

问题 I'm trying to use the rvest package to scrape data from a web page. In a simple format, the html code looks like this: <div class="style"> <input id="a" value="123"> <input id="b"> </div> I want to get the value 123 from the first input. I tried the following R code: library(rvest) url<-"xxx" output<-html_nodes(url, ".style input") This will return a list of input tags: [[1]] <input id="a" value="123"> [[2]] <input id="b"> Next I tried using html_node to reference the first input tag by id:

Iterating rvest scrape function gives: “Error in open.connection(x, ”rb“) : Timeout was reached”

阅读更多关于 Iterating rvest scrape function gives: “Error in open.connection(x, ”rb“) : Timeout was reached”

问题 I'm scraping this website using the "rvest"-package. When I iterate my function too many times I get "Error in open.connection(x, "rb") : Timeout was reached". I have searched for similar questions but the answers seems to lead to dead ends. I have a suspicion that it is server side and the website has a build-in restriction on how many times I can visit the page. How do investigate this hypothesis? The code: I have the links to the underlying web pages and want to construct a data frame with

Web scraping in R?

阅读更多关于 Web scraping in R?

I would like to web scrape this web site In particular I would like to take the information that it is in that table: Please note that I choose a specific date on the upper right corner. By following this guide I wrote the following code library(rvest) url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/' webpage_nba <- read_html(url_nba) #Using CSS selectors to scrap the rankings section data_nba <- html_nodes(webpage_nba,'#standings-table') #Converting the ranking data to text data_nba <- html_text(data_nba) write.csv(data_nba,"web scraping test.csv") From my understanding

Scraping linked HTML webpages by looping the rvest::follow_link() function

阅读更多关于 Scraping linked HTML webpages by looping the rvest::follow_link() function

How can I loop the rvest::follow_link() function to scrape linked webpages? Use Case: Identify all Lego Movie cast members Follow all Lego Movie cast member links Grab a table of each movie (+ year) for all cast members The required selectors I need are below: library(rvest) lego_movie <- html("http://www.imdb.com/title/tt1490017/") lego_movie <- lego_movie %>% html_nodes(".itemprop , .character a") %>% html_text() # follow cast links (".itemprop .itemprop") # grab tables of all movies and dates for each cast member (".year_column , b a") Desired Output: castMember movie year Will Arnett Lego

R Change IP Address programmatically

阅读更多关于 R Change IP Address programmatically

问题 Currently changing user_agent by passing different strings to the html_session() method. Is there also a way to change your IP address on a timer when scraping a website? 回答1: You can use a proxy (which changes your ip) via use_proxy as follows: html_session("you-url", use_proxy("proxy-ip", port)) For more details see: ?httr::use_proxy To check if it is working you can do the following: require(httr) content(GET("https://ifconfig.co/json"), "parsed") content(GET("https://ifconfig.co/json",