purrr | 易学教程

Forwarding arguments in a function with purrr::map_df

阅读更多关于 Forwarding arguments in a function with purrr::map_df

问题 I am trying to create a function that reads in all sheets in an excel workbook using readxl::read_excel and binds them into a single data frame, and allows me to pass through additional arguments to read_excel . I can do the first part fine, but not the second part. library(magrittr) # example excel workbook with multiple sheets path <- readxl::readxl_example("datasets.xlsx") # function with simple forwarding read_all <- function(path, ...) { path %>% readxl::excel_sheets() %>% rlang::set

tidyverse: Cross tables of one variable with all other variables in data.frame

阅读更多关于 tidyverse: Cross tables of one variable with all other variables in data.frame

问题 I want to make cross table of a variable with all other variables in the data.frame. library(tidyverse) library(janitor) humans <- starwars %>% filter(species == "Human") humans %>% janitor::tabyl(gender, eye_color) gender blue blue-gray brown dark hazel yellow female 3 0 5 0 1 0 male 9 1 12 1 1 2 humans %>% dplyr::select_if(is.character) %>% dplyr::select(-name, -gender) %>% purrr::map(.f = ~janitor::tabyl(dat = humans, gender, .x)) Error: Unknown columns `blond`, `none`, `brown`, `brown,

Date column coerced to numeric when indexing dataframe with [[ and a vector

阅读更多关于 Date column coerced to numeric when indexing dataframe with [[ and a vector

问题 I am creating a data.frame with a column of type Date . When indexing the data frame with [[ and a numeric vector, the Date becomes a number. This is causing a problem when using purrr::pmap . Can anyone explain why this is happening and is there a work around? Example: x <- data.frame(d1 = lubridate::ymd(c("2018-01-01","2018-02-01"))) class(x$d1) # [1] "Date" x[[1]] # [1] "2018-01-01" "2018-02-01" x[[c(1, 1)]] # [1] 17532 回答1: Overview After reading why does unlist() kill dates in R and the

Efficiently transform XML to data frame

阅读更多关于 Efficiently transform XML to data frame

问题 I need to transform some vanilla xml into a data frame. The XML is a simple representation of rectangular data (see example below). I can achieve this pretty straightforwardly in R with xml2 and a couple of for loops. However, I'm sure there is a much better/faster way (purrr?). The XML I will be ultimately working with are very large, so more efficient methods are preferred. I would be grateful for any advice from the community. library(tidyverse) library(xml2) demo_xml <- "<DEMO> <EPISODE>

Running multiple glm models on mixed data with purrr

阅读更多关于 Running multiple glm models on mixed data with purrr

问题 Suppose we have a toy data set: library(tidyverse) library(purrr) tbl <- tibble(a = rep(c(0, 1), each = 5), b = rep(c(0, 1), times = 5), c = runif(10), d = rexp(10)) %>% mutate_at(vars(1,2), as.factor) where a is a dependent variable and b:d are independent variables. The idea is to run glm model for each independent variable: glm(a ~ b, data = tbl, family = "binomial") glm(a ~ c, data = tbl, family = "binomial") glm(a ~ d, data = tbl, family = "binomial") My initial attempt goes as follows:

how to print a list-column of ggplots to pdf?

阅读更多关于 how to print a list-column of ggplots to pdf?

问题 Consider this funny example mydata <- data_frame(group = c('a', 'a', 'a', 'b', 'b', 'b'), x = c(1,2,3,5,6,7), y = c(3,5,6,4,3,2)) > mydata # A tibble: 6 x 3 group x y <chr> <dbl> <dbl> 1 a 1 3 2 a 2 5 3 a 3 6 4 b 5 4 5 b 6 3 6 b 7 2 Here I can nest() by group, and store a group-based ggplot into a list-column . Crazy stuff. > mydata %>% group_by(group) %>% + nest() %>% + mutate(myplot = map(data, ~ggplot(data = .x, aes(x = x, y = x)) + geom_point())) # A tibble: 2 x 3 group data myplot <chr>

Make column of input items with purrr::map_df using .id without duplicating inputs for named vector

阅读更多关于 Make column of input items with purrr::map_df using .id without duplicating inputs for named vector

问题 I often want to map over a vector of column names in a data frame, and keep track of the output using the .id argument. But to write the column names related to each map iteration into that .id column seems to require doubling up their name in the input vector - in other words, by naming each column name with its own name. If I don't name the column with its own name, then .id just stores the index of the iteration. This is expected behavior, per the purrr::map docs: .id Either a string or

Creating new variables with purrr (how does one go about that?)

阅读更多关于 Creating new variables with purrr (how does one go about that?)

问题 I have a large data set, with a bunch of columns that I want to run the same function on, based on either prefix or suffix, to create a new variable. What I would like to be able to do is provide a list to map, and create new variables. dataframe <- data_frame(x_1 = c(1,2,3,4,5,6), x_2 = c(1,1,1,2,2,2), y_1 = c(200,400,120,300,100,100), y_2 = c(250,500,150,240,140,400)) newframe <- dataframe %>% mutate(x_ratio = x_1/x_2, y_ratio = y_1/y_2) In the past, i have written code in a string

Calculate change since base year?

阅读更多关于 Calculate change since base year?

问题 I have a dataset that looks something like this: df1 <- data.frame(id = c(rep("A1",4), rep("A2",4)), time = rep(c(0,2:4), 2), y1 = rnorm(8), y2 = rnorm(8)) For each of the y variables, I want to calculate their change since time==0 . Basically, I want to do this: calc_chage <- function(id, data){ #y1 y1_0 <- data$y1[which(data$time==0 & data$id==id)] D2y1 <- data$y1[which(data$time==2 & data$id==id)] - y1_0 D3y1 <- data$y1[which(data$time==3 & data$id==id)] - y1_0 D4y1 <- data$y1[which(data

Calculate change since base year?

阅读更多关于 Calculate change since base year?