rvest | 易学教程

R Scrape a list of Google + urls using purrr package

阅读更多关于 R Scrape a list of Google + urls using purrr package

问题 I am working on a web scraping project, which aims to extract Google + reviews from a set of children's hospitals. My methodology is as follows: 1) Define a list of Google + urls to navigate to for review scraping. The urls are in a dataframe along with other variables defining the hospital. 2) Scrape reviews, number of stars, and post time for all reviews related to a given url. 3) Save these elements in a dataframe, and name the dataframe after another variable in the dataframe

Rvest: getting node text and not its childen's text

阅读更多关于 Rvest: getting node text and not its childen's text

问题 The method html_text() (from R Package rvest) concatenates the text of the node and all its children . I would like to extract only the father's text . Forthe following example, html_text() gives HELLO GOODBYE . I want to get just GOODBYE . How can I get it? <div class="joke"> <div class="div_inside"> <div class="title_inside"> <a class="link" href="sompage.htm">HELLO</a> </div> </div> GOODBYE </div> 回答1: Try to grab the main div tag with class "joke" without picking up its children, using

R: Webscraping a list of URLs to get a DataFrame

阅读更多关于 R: Webscraping a list of URLs to get a DataFrame

问题 I can see the correct data, but cannot put it on a Data Frame (It appears as a list of elements). I think the problem is my understanding of the apply family functions. Any hint is welcome. Here is a similar question, but I think it is better to post mine as it contains more details: Webscraping content across multiple pages using rvest package library(rvest) library(lubridate) library(dplyr) urls <- list("http://simple.ripley.com.pe/tv-y-video/televisores/ver-todo-tv", "http://simple.ripley

How to pass multiple values in a rvest submission form

阅读更多关于 How to pass multiple values in a rvest submission form

问题 This is a follow up to a prior thread. The code works fantastic for a single value but I get the following error when trying to pass more than 1 value I get an error based on the length of the function. Error in vapply(elements, encode, character(1)) : values must be length 1, but FUN(X[1]) result is length 3 Here is a sample of the code. In most instances I have been able just to name an object and scrape that way. library(httr) library(rvest) library(dplyr) b<-c('48127','48180','49504')

rvest error: “Error in class(out) <- ”XMLNodeSet“ : attempt to set an attribute on NULL”

阅读更多关于 rvest error: “Error in class(out)

问题 I'm trying to scrape a set of web pages with the new rvest package. It works for most of the web pages but when there are no tabular entries for a particular letter, an error is returned. # install the packages you need, as appropriate install.packages("devtools") library(devtools) install_github("hadley/rvest") library(rvest) This code works OK because there are entries for the letter E on the web page. # works OK url <- "https://www.propertytaxcard.com/ShopHillsborough/participants/alph/E"

Scrape data from flash page using rvest

阅读更多关于 Scrape data from flash page using rvest

问题 I am trying to scrape data from this page: http://www.atpworldtour.com/en/tournaments/brisbane-international-presented-by-suncorp/339/2016/match-stats/r975/f324/match-stats? If I try to scrape the name of the players using the css selector and the usual rvest syntax: names <- read_html("http://www.atpworldtour.com/en/tournaments/brisbane-international-presented-by-suncorp/339/2016/match-stats/r975/f324/match-stats?") %>% html_nodes(".scoring-player-name") %>% sapply(html_text) everything goes

Why can't I scrape the following table

阅读更多关于 Why can't I scrape the following table

问题 I'm using the code below in order to scrape a webpage's list.The result that I get is only the table's header.How can I scrape the whole table? library(rvest) theurl <- "http://www.forbes.com/midas/list/" file<-read_html(theurl) tables<-html_nodes(file, "table") table1 <- html_table(tables[2], fill = F) table1 c<-as.data.frame(lapply(table1, stack)) summary(c$values) head(c$values) 来源： https://stackoverflow.com/questions/39033432/why-cant-i-scrape-the-following-table

R Download .csv file tied to input boxes and a “click” button

阅读更多关于 R Download .csv file tied to input boxes and a “click” button

问题 I am attempting to download a .csv file from https://www.fantasysharks.com/apps/bert/forecasts/projections.php? that is tied directly the input settings (is not a static download link) and load it into R. After the drop boxes are filled in, you then have to click on the download .csv button. I found this Using R to "click" a download file button on a webpage that details a bit how to do it using POST, but am unable to get it to work with some modifications to that code. I have attempted this

Scraping with rvest: how to fill blank numbers in a row to transform in a data frame?

阅读更多关于 Scraping with rvest: how to fill blank numbers in a row to transform in a data frame?

问题 I'm trying to build a dataframe with 2 data I've scraped on IMDB: the first one has 50 values and the second one has only 29. Is there an easy way to ask R to automatically fill with NA the other 21 values that he didn't find? My code: imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced") title <- html_nodes(imdb, '.lister-item-header a') title <- html_text(title) metascore <- html_nodes(imdb, '.ratings-metascore') metascore <-

Trouble scraping a table into R

阅读更多关于 Trouble scraping a table into R

问题 I am trying and failing to scrape the table of average IQs by country from this web page into R. I'm trying to follow the process described in this blog post, but I can't seem to find the right XPath. Here's the code I'm using, with a placeholder for the XPath to that table: library(dplyr) library(rvest) url <- "https://iq-research.info/en/page/average-iq-by-country" xpath <- [xpath copied from Chrome using SelectorGadget] IQ <- url %>% read_html() %>% html_nodes(xpath) %>% html_table() I