purrr

Forwarding arguments in a function with purrr::map_df

一曲冷凌霜 提交于 2021-02-08 04:28:06
问题 I am trying to create a function that reads in all sheets in an excel workbook using readxl::read_excel and binds them into a single data frame, and allows me to pass through additional arguments to read_excel . I can do the first part fine, but not the second part. library(magrittr) # example excel workbook with multiple sheets path <- readxl::readxl_example("datasets.xlsx") # function with simple forwarding read_all <- function(path, ...) { path %>% readxl::excel_sheets() %>% rlang::set

tidyverse: Cross tables of one variable with all other variables in data.frame

独自空忆成欢 提交于 2021-02-07 19:06:56
问题 I want to make cross table of a variable with all other variables in the data.frame. library(tidyverse) library(janitor) humans <- starwars %>% filter(species == "Human") humans %>% janitor::tabyl(gender, eye_color) gender blue blue-gray brown dark hazel yellow female 3 0 5 0 1 0 male 9 1 12 1 1 2 humans %>% dplyr::select_if(is.character) %>% dplyr::select(-name, -gender) %>% purrr::map(.f = ~janitor::tabyl(dat = humans, gender, .x)) Error: Unknown columns `blond`, `none`, `brown`, `brown,

Date column coerced to numeric when indexing dataframe with [[ and a vector

隐身守侯 提交于 2021-02-07 11:58:15
问题 I am creating a data.frame with a column of type Date . When indexing the data frame with [[ and a numeric vector, the Date becomes a number. This is causing a problem when using purrr::pmap . Can anyone explain why this is happening and is there a work around? Example: x <- data.frame(d1 = lubridate::ymd(c("2018-01-01","2018-02-01"))) class(x$d1) # [1] "Date" x[[1]] # [1] "2018-01-01" "2018-02-01" x[[c(1, 1)]] # [1] 17532 回答1: Overview After reading why does unlist() kill dates in R and the

Efficiently transform XML to data frame

隐身守侯 提交于 2021-01-29 18:51:02
问题 I need to transform some vanilla xml into a data frame. The XML is a simple representation of rectangular data (see example below). I can achieve this pretty straightforwardly in R with xml2 and a couple of for loops. However, I'm sure there is a much better/faster way (purrr?). The XML I will be ultimately working with are very large, so more efficient methods are preferred. I would be grateful for any advice from the community. library(tidyverse) library(xml2) demo_xml <- "<DEMO> <EPISODE>

Running multiple glm models on mixed data with purrr

亡梦爱人 提交于 2021-01-29 12:12:02
问题 Suppose we have a toy data set: library(tidyverse) library(purrr) tbl <- tibble(a = rep(c(0, 1), each = 5), b = rep(c(0, 1), times = 5), c = runif(10), d = rexp(10)) %>% mutate_at(vars(1,2), as.factor) where a is a dependent variable and b:d are independent variables. The idea is to run glm model for each independent variable: glm(a ~ b, data = tbl, family = "binomial") glm(a ~ c, data = tbl, family = "binomial") glm(a ~ d, data = tbl, family = "binomial") My initial attempt goes as follows:

how to print a list-column of ggplots to pdf?

跟風遠走 提交于 2021-01-29 06:47:01
问题 Consider this funny example mydata <- data_frame(group = c('a', 'a', 'a', 'b', 'b', 'b'), x = c(1,2,3,5,6,7), y = c(3,5,6,4,3,2)) > mydata # A tibble: 6 x 3 group x y <chr> <dbl> <dbl> 1 a 1 3 2 a 2 5 3 a 3 6 4 b 5 4 5 b 6 3 6 b 7 2 Here I can nest() by group, and store a group-based ggplot into a list-column . Crazy stuff. > mydata %>% group_by(group) %>% + nest() %>% + mutate(myplot = map(data, ~ggplot(data = .x, aes(x = x, y = x)) + geom_point())) # A tibble: 2 x 3 group data myplot <chr>

Make column of input items with purrr::map_df using .id without duplicating inputs for named vector

青春壹個敷衍的年華 提交于 2021-01-28 18:48:26
问题 I often want to map over a vector of column names in a data frame, and keep track of the output using the .id argument. But to write the column names related to each map iteration into that .id column seems to require doubling up their name in the input vector - in other words, by naming each column name with its own name. If I don't name the column with its own name, then .id just stores the index of the iteration. This is expected behavior, per the purrr::map docs: .id Either a string or

Creating new variables with purrr (how does one go about that?)

蹲街弑〆低调 提交于 2021-01-28 18:19:42
问题 I have a large data set, with a bunch of columns that I want to run the same function on, based on either prefix or suffix, to create a new variable. What I would like to be able to do is provide a list to map, and create new variables. dataframe <- data_frame(x_1 = c(1,2,3,4,5,6), x_2 = c(1,1,1,2,2,2), y_1 = c(200,400,120,300,100,100), y_2 = c(250,500,150,240,140,400)) newframe <- dataframe %>% mutate(x_ratio = x_1/x_2, y_ratio = y_1/y_2) In the past, i have written code in a string

Calculate change since base year?

心不动则不痛 提交于 2021-01-28 17:37:12
问题 I have a dataset that looks something like this: df1 <- data.frame(id = c(rep("A1",4), rep("A2",4)), time = rep(c(0,2:4), 2), y1 = rnorm(8), y2 = rnorm(8)) For each of the y variables, I want to calculate their change since time==0 . Basically, I want to do this: calc_chage <- function(id, data){ #y1 y1_0 <- data$y1[which(data$time==0 & data$id==id)] D2y1 <- data$y1[which(data$time==2 & data$id==id)] - y1_0 D3y1 <- data$y1[which(data$time==3 & data$id==id)] - y1_0 D4y1 <- data$y1[which(data

Calculate change since base year?

情到浓时终转凉″ 提交于 2021-01-28 17:36:17
问题 I have a dataset that looks something like this: df1 <- data.frame(id = c(rep("A1",4), rep("A2",4)), time = rep(c(0,2:4), 2), y1 = rnorm(8), y2 = rnorm(8)) For each of the y variables, I want to calculate their change since time==0 . Basically, I want to do this: calc_chage <- function(id, data){ #y1 y1_0 <- data$y1[which(data$time==0 & data$id==id)] D2y1 <- data$y1[which(data$time==2 & data$id==id)] - y1_0 D3y1 <- data$y1[which(data$time==3 & data$id==id)] - y1_0 D4y1 <- data$y1[which(data