purrr | 易学教程

Saving deeply nested files to specific directories with specific filenames

阅读更多关于 Saving deeply nested files to specific directories with specific filenames

问题 Given a 3 level nested list: mylist <- list("1000"=list("cars"=list("fast"=mtcars[1:10,], "slow"=mtcars[11:15,]), "flower"=iris), "2000"=list("tooth"=ToothGrowth, "air"=airquality, "cars"=list("cruiser"=mtcars[5:12,], "fast"=mtcars[1:3,], "mild"=mtcars[9:18,]))) (ie: mylist$1000$cars$fast , where fast is a dataframe, and cars and 1000 are nested lists in mylist ) I'd like to save each innermost dataframe, (ie: fast ) as a .csv with the df name as it's file name, ie: fast.csv , and I want the

Saving deeply nested files to specific directories with specific filenames

阅读更多关于 Saving deeply nested files to specific directories with specific filenames

Web scraping the data behind every url from a list of urls

阅读更多关于 Web scraping the data behind every url from a list of urls

问题 I am trying to gather a dataset from this site called ICObench. I've managed to extract the names of each ICO in the 91 pages using rvest and purrr, but Im confused as to how I can extract data behind each name in the list. All the names are clickable links. This is the code so far: url_base <- "https://icobench.com/icos?page=%d&filterBonus=&filterBounty=&filterTeam=&filterExpert=&filterSort=&filterCategory=all&filterRating=any&filterStatus=ended&filterCountry=any&filterRegistration=0

How to name a dataframe so that I can look for it within a list

阅读更多关于 How to name a dataframe so that I can look for it within a list

问题 I have a function that returns a dataframe. I use this function with furrr::future_map2 so that I get a list with several dataframes. What I want is the ability to use the name input in the function to name the dataframe so that I can search the return list by name. example test <- function(x, name){ require(tidyverse) z <- data.frame(x+1) %>% stats::setNames(., "a") return(z) } furrr::future_map2(1:3, c("a", "b", "c"), ~test(.x, .y)) The first df within the list would be a , the second b and

use pmap() to calculate row means of several columns

阅读更多关于 use pmap() to calculate row means of several columns

问题 I'm trying to better understand how pmap() works within dataframes, and I get a surprising result when applying pmap() to compute means from several columns. mtcars %>% mutate(comp_var = pmap_dbl(list(vs, am, cyl), mean)) %>% select(comp_var, vs, am, cyl) In the above example, comp_var is equal to the value of vs in its row, rather than the mean of the three variables in a given row. I know that I could get accurate results for comp_var using ... mtcars %>% rowwise() %>% mutate(comp_var =

R function using . and ~

阅读更多关于 R function using . and ~

问题 I'm trying to learn to use ~ and . in R. In the code below is the same function written with and without the use of ~ and . .I didn't understand what happened in the first function to appear the error. #FIRST FUNCTION col_summary2 <- function(.x, .f, ...){ .x <- purrr::keep(.x, is.numeric) purrr::map_dbl(.x, ~.f(., ...)) } col_summary2(mtcars,mean) #Error in mean.default(., ...) : 'trim' must be numeric of length one #SECOND FUNCTION col_summary2 <- function(.x, .f, ...){ .x <- purrr::keep(.x

How do pipes work with purrr map() function and the “.” (dot) symbol

阅读更多关于 How do pipes work with purrr map() function and the “.” (dot) symbol

问题 When using both pipes and the map() function from purrr, I am confused about how data and variables are passed along. For instance, this code works as I expect: library(tidyverse) cars %>% select_if(is.numeric) %>% map(~hist(.)) Yet, when I try something similar using ggplot, it behaves in a strange way. cars %>% select_if(is.numeric) %>% map(~ggplot(cars, aes(.)) + geom_histogram()) I'm guessing this is because the "." in this case is passing a vector to aes(), which is expecting a column

Double nesting in the tidyverse

阅读更多关于 Double nesting in the tidyverse

问题 Using the examples from Wickhams introduction to purrr in R for data science, I am trying to create a double nested list. library(gapminder) library(purrr) library(tidyr) gapminder nest_data <- gapminder %>% group_by(continent) %>% nest(.key = by_continent) How can I further nest the countries so that nest_data contains by_continent and a new level of nesting by_contry that ultimately includes the tibble by_year? Furthermore, after creating this datastructure for the gapminder data - how

Use filter() (and other dplyr functions) inside nested data frames with map()

阅读更多关于 Use filter() (and other dplyr functions) inside nested data frames with map()

问题 I'm trying to use map() of purrr package to apply filter() function to the data stored in a nested data frame. "Why wouldn't you filter first, and then nest? - you might ask. That will work (and I'll show my desired outcome using such process), but I'm looking for ways to do it with purrr . I want to have just one data frame, with two list-columns, both being nested data frames - one full and one filtered. I can achieve it now by performing nest() twice: once on all data, and second on

How to fork/parallelize process in purrr::pmap

阅读更多关于 How to fork/parallelize process in purrr::pmap

问题 I have the following code that does serial processing with purr::pmap library(tidyverse) set.seed(1) params <- tribble( ~mean, ~sd, ~n, 5, 1, 1, 10, 5, 3, -3, 10, 5 ) params %>% pmap(rnorm) #> [[1]] #> [1] 4.373546 #> #> [[2]] #> [1] 10.918217 5.821857 17.976404 #> #> [[3]] #> [1] 0.2950777 -11.2046838 1.8742905 4.3832471 2.7578135 How can I parallelize (fork) the process above so that it runs faster and produces identical result? Here, I use rnorm for illustration purpose, in reality I have