purrr | 易学教程

piping a vector into all() to test equality

阅读更多关于 piping a vector into all() to test equality

问题 I'm trying to pipe a vector into an all() statement to check if all elements are equal to a certain value. I figure I need to use the exposition pipe %$% since all() does not have a built-in data argument. My attempt leads to an error: library(tidyverse) library(magrittr) vec <- c("a", "b", "a") vec %>% keep(!grepl("b", .)) %$% all(. == "a") #> Error in eval(substitute(expr), data, enclos = parent.frame()): invalid 'envir' argument of type 'character' If I break the pipe before all() and

Finding duplicate observations of selected variables in a tibble

阅读更多关于 Finding duplicate observations of selected variables in a tibble

问题 I have a rather large tibble (called df.tbl with ~ 26k rows and 22 columns) and I want to find the "twins" of each object, i.e. each row that has the same values in column 2:7 (date:Pos). If I use: inner_join(df.tbl, ~ df.tbl[i,], by = c("date", "forge", "serNum", "PinMain", "PinMainNumber", "Pos")) with i being the row I want to check for "twins", everything is working as expected, spitting out a 2 x 22 tibble, and I can expand this using: x <- NULL for (i in 1:nrow(df.tbl)) { x[[i]] <- as

R ranger confusion.matrix is larger than supposed when using expand.grid and purrr::pmap

阅读更多关于 R ranger confusion.matrix is larger than supposed when using expand.grid and purrr::pmap

问题 Sorry for all the purrr related questions today, still trying to figure out how to make efficient use of it. So with some help from SO I managed to get random forest ranger model running based on input values coming from a data.frame. This is accomplished using purrr::pmap . However, I don't understand how the return values are generated from the called function. Consider this example: library(ranger) data(iris) Input_list <- list(iris1 = iris, iris2 = iris) # let's assume these are different

r rvest error: “Error in doc_namespaces(doc) : external pointer is not valid”

阅读更多关于 r rvest error: “Error in doc_namespaces(doc) : external pointer is not valid”

问题 My question is similar to this one, but the latter did not receive an answer I can work with. I am scraping thousands of urls with xml2::read_html . This works fine. But when I try and parse the resulting html documents using purrr::map_df and html_nodes , I get the following error: Error in doc_namespaces(doc) : external pointer is not valid For some reason, I am unable to reproduce the error using examples. The example below is not good, because it works totally fine. But if someone could

Adding column if it does not exist

阅读更多关于 Adding column if it does not exist

问题 I have a bunch of data frames with different variables. I want to read them into R and add columns to those that are short of a few variables so that they all have a common set of standard variables, even if some are unobserved. In other words... Is there a way to add columns of NA in the tidyverse when a column does not exist? My current attempt works for adding new variables where the column doesn't exist ( top_speed ) but fails when the column already exists ( mpg ) (it sets all

Adding column if it does not exist

阅读更多关于 Adding column if it does not exist

How to select elements with the same name from nested list with purrr?

阅读更多关于 How to select elements with the same name from nested list with purrr?

问题 require(purrr) list <- list( node = list(a = list(y = 1, t = 1), b = list(y = 1, t = 2)), node = list(a = list(y = 1, t = 3), b = list(y = 1, t = 4))) How to select all "t" values with purrr? 回答1: You can use modify_depth for this if you know what level you want to pull the information out of. You want to pull t out for the nested lists, which is level 2. modify_depth(list, 2, "t") $node $node$a [1] 1 $node$b [1] 2 $node $node$a [1] 3 $node$b [1] 4 The purrr modify family of functions returns

Scraping pages with inconsistent lengths in dataframe

阅读更多关于 Scraping pages with inconsistent lengths in dataframe

问题 I want to scrape all the names from this page. With the result of one tibble of three columns. My code only works if all the data is there hence my error: Error: Tibble columns must have consistent lengths, only values of length one are recycled: * Length 20: Columns `huisarts`, `url` * Length 21: Column `praktijk` How can I let my code run but fill with Na 's in tibble if the data isn't there. My code for a pauzing robot later used in scraper function: pauzing_robot <- function (periods = c

Web-Scraping in R programming (rvest)

阅读更多关于 Web-Scraping in R programming (rvest)

问题 I am trying to scrape all details ( Type Of Traveller, Seat Type,Route,Date Flown, Seat Comfort, Cabin Staff Service, Food & Beverages, Inflight Entertainment,Ground Service,Wifi & Connectivity,Value For Money ) inclusive of the star rating from the airline quality webpage https://www.airlinequality.com/airline-reviews/emirates/ Not Working as expected my_url<- c("https://www.airlinequality.com/airline-reviews/emirates/") review <- function(url){ review<- read_html(url) %>% html_nodes("

How can I speed up spatial operations in `dplyr::mutate()`?

阅读更多关于 How can I speed up spatial operations in `dplyr::mutate()`?

问题 I am working on a spatial problem using the sf package in conjunction with dplyr and purrr . I would prefer to perform spatial operations inside a mutate call, like so: simple_feature %>% mutate(geometry_area = map_dbl(geometry, ~ as.double(st_area(.x)))) I like that this approach allows me to run a series of spatial operations using %>% and mutate . I dislike that this approach seems to significantly increase the run-time of the sf functions (sometimes prohibitively) and I would appreciate