purrr

Row-wise iteration like apply with purrr

删除回忆录丶 提交于 2019-12-04 07:46:30
问题 How do I achieve row-wise iteration using purrr::map? Here's how I'd do it with a standard row-wise apply. df <- data.frame(a = 1:10, b = 11:20, c = 21:30) lst_result <- apply(df, 1, function(x){ var1 <- (x[['a']] + x[['b']]) var2 <- x[['c']]/2 return(data.frame(var1 = var1, var2 = var2)) }) However, this is not too elegant, and I would rather do it with purrr. May (or may not) be faster, too. 回答1: You can use pmap for row-wise iteration. The columns are used as the arguments of whatever

Using purrr::map to iterate linear model over columns in data frame

一世执手 提交于 2019-12-04 06:45:10
I am trying to do an exercise to become more familiar with how to use the map function in purrr. I am creating some random data (10 columns of 10 datapoints) and then I wanted to use map to perform a series of regressions (i.e. lm(y ~ x, data = )) over the resulting columns in the data frame. If I just repeatedly use the first column as 'y', I want to perform 10 regressions with each column from 1 to 10 as 'x'. Obviously the results are unimportant - it's just the method. I want to end up with a list of 10 linear model objects. list_of_vecs <- list() for (i in 1:10){ list_of_vecs[[paste('vec_'

map a vector of characters to lm formula in r

谁说我不能喝 提交于 2019-12-04 05:25:40
I'm trying to make a list of lm object using purrr::map. use mtcars as an example: vars <- c('hp', 'wt', 'disp') map(vars, ~lm(mpg~.x, data=mtcars)) error: Error in model.frame.default(formula = mpg ~ .x, data = mtcars, drop.unused.levels = TRUE) : variable lengths differ (found for '.x') I also tried: map(vars, function(x) {x=sym(x); lm(mpg~!!x, data=mtcars)}) I got error message: Error in !x : invalid argument type Can anyone tell what I did wrong? Thanks in advance. The usual way is to paste together formulas as strings, convert them by map ping as.formula (you can't make a vector of

Function for Tidy chisq.test Output for Visualizing or Filtering P-Values

旧时模样 提交于 2019-12-04 02:11:22
问题 For data... library(productplots) library(ggmosaic) For code... library(tidyverse) library(broom) I'm trying to create tidy chisq.test output so that I can easily filter or visualize p-values. I'm using the "happy" dataset (which is included with either of the packages listed above) For this example, if I wanted to condition the "happy" variable on all other variables,I would isolate the categorical variables (I'm not going to create factor groupings out of age, year, etc, for this example),

Why is split inefficient on large data frames with many groups?

天涯浪子 提交于 2019-12-04 01:11:25
问题 df %>% split(.$x) becomes slow for large number of unique values of x. If we instead split the data frame manually into smaller subsets and then perform split on each subset we reduce the time by at least an order of magnitude. library(dplyr) library(microbenchmark) library(caret) library(purrr) N <- 10^6 groups <- 10^5 df <- data.frame(x = sample(1:groups, N, replace = TRUE), y = sample(letters, N, replace = TRUE)) ids <- df$x %>% unique folds10 <- createFolds(ids, 10) folds100 <-

advice on Usage of dplyr:: do vs purrr: map, tidy::nest, for predictions

一世执手 提交于 2019-12-03 21:29:55
I just came across the the purrr package and I think this would help me out a bit in terms of what I want to do - I just can't put it together. I think this is going to be along post but goes over a common use case I think many others run into so hopefully this is of use to them as well. This is what I'm aiming for: From one big dataset run multiple models on each of the different subgroups. Have these models readily available so I can examine - for coeffients, accuracy, etc. From this saved model list for each of the different groupings, be able to apply the corresponding model to the

Handling vectors of different lengths in purrr

我与影子孤独终老i 提交于 2019-12-03 17:30:52
I currently have the following R code that runs multiple regression models with different predictors, across different subsets, and returns tidied output using the broom package. library(dplyr) library(purrr) library(broom) cars <- mtcars preds<-c("disp", "drat", "wt") model_fits <- map_df(preds, function(pred) { model_formula <- sprintf("mpg ~ %s", pred) cars %>% group_by(cyl) %>% do(tidy(lm(model_formula, data = .), conf.int = T)) %>% filter(term == pred) %>% mutate(outcome = "mpg") %>% select(outcome, cyl:estimate, starts_with("conf.")) }) This results in the following data frame: > model

R - Parallelizing multiple model learning (with dplyr and purrr)

删除回忆录丶 提交于 2019-12-03 17:02:20
问题 This is a follow up to a previous question about learning multiple models. The use case is that I have multiple observations for each subject, and I want to train a model for each of them. See Hadley's excellent presentation on how to do this. In short, this is possible to do using dplyr and purrr like so: library(purrr) library(dplyr) library(fitdistrplus) dt %>% split(dt$subject_id) %>% map( ~ fitdist(.$observation, "norm")) So since the model building is an embarrassingly parallel task, I

why does map_if() not work within a list

纵饮孤独 提交于 2019-12-03 14:54:24
Please help me 1) Why does map_if not work within a list 2) Is there a way to make it work 3) If not, what are the alternatives Thanks in advance. library(dplyr) library(purrr) cyl <- split(mtcars, mtcars$cyl) # This works map_if(mtcars, is.numeric, mean) # This does not work map_if(cyl, is.numeric, mean) Because you need to map to one lever lower, the columns are at level 2. So you can do: map(cyl, ~map_if(., is.numeric, mean)) Or: map(cyl, map_if, is.numeric, mean) Without the if one could do map_depth(cyl, 2, mean) count You can try lapply : lapply(cyl, function(x) map_if(x, is.numeric,

Use filter() (and other dplyr functions) inside nested data frames with map()

独自空忆成欢 提交于 2019-12-03 12:58:17
I'm trying to use map() of purrr package to apply filter() function to the data stored in a nested data frame. "Why wouldn't you filter first, and then nest? - you might ask. That will work (and I'll show my desired outcome using such process), but I'm looking for ways to do it with purrr . I want to have just one data frame, with two list-columns, both being nested data frames - one full and one filtered. I can achieve it now by performing nest() twice: once on all data, and second on filtered data: library(tidyverse) df <- tibble( a = sample(x = rep(c('x','y'),5), size = 10), b = sample(c(1