R - how to filter data with a list of arguments to produce multiple data frames and graphs

自闭症网瘾萝莉.ら 提交于 2021-02-10 18:30:43

问题


I am looking for a way to use a list of filter arguments to produce different objects. I have a data set for which I want to make several graphs. However, I want all these graphs based on subsets of the dataset. For illustrative purposes I have made the following data.

df <- data.frame(type = c("b1", "b2", "b1", "b2"),
                 yield = c("15", "10", "5", "0"),
                 temperature = c("2", "21", "26", "13"),
                 Season = c("Winter", "Summer", "Summer", "Autumn"),
                 profit = c(TRUE, TRUE, FALSE, FALSE))

Also, I have a list of filter arguments.

filters <- c("brand=='b1'",
             "profit",
             "Season=='Summer'",
             "profit==FALSE",
             "yield >= 10",
             "")

What I would want is that I could use a for loop to have all these filters produce objects with the filtered data, and subsequently plot graphs. I have tried this in the following way.

for(i in 1:length(filters)){
  assign(paste0("df", i), filter(df, factor(filters[i])))
  assign(paste0("plot", i), ggplot(database, aes(x = temperature, y = yield)) + geom_point())
}

However, this did not work because the filter() function does not accept <fct> as an argument, nor <chr> (e.g., "brand=='b1'"). What I would want is brand=='b1', so filter() accepts it as an argument. Does anybody have an idea to do this?

Also, as an additional question, I would like to automate the whole process and end with an combined graph, so grid.arrange() at the end. Of course I could automate the ncol and nrow with some devision of length(filters). But how to I get all the produced plots in the grid.arrange()? This should probably be outside the for loop, right? Any ideas here?


回答1:


You can do it by using eval and parse.

Also, a lapply over a custom function sounds more reasonable than a for loop with assign. The result is a list of ggplot objects.

To set all charts all together grid.arrange from the gridExtra package works fine. You just need to assign the list of your charts to the argument called grobs.

library(dplyr)
library(ggplot2)

df <- data.frame(type = c("b1", "b2", "b1", "b2"),
                 yield = c(15, 10, 5, 0),
                 temperature = c("2", "21", "26", "13"),
                 Season = c("Winter", "Summer", "Summer", "Autumn"),
                 profit = c(TRUE, TRUE, FALSE, FALSE))

filters <- list("type=='b1'",
                "profit",
                "Season=='Summer'",
                "profit==FALSE",
                "yield >= 10",
                "TRUE")


myfun <- function(fltr, df){

  df <- filter(df, eval(parse(text = fltr)))
  ggplot(df, aes(x = temperature, y = yield)) + geom_point()

}


ggs <- lapply(filters, myfun, df = df)

gridExtra::grid.arrange(grobs = ggs)

I made a couple of changes in your data: yield must be a numeric since you are using a filter applicable only to numeric vectors and the last filter (which was empty) is now equal to "TRUE" [I supposed you wanted to take everything in consideration]




回答2:


Rather than storing your filters, as character strings, it would be better to store them a quosures. For example

library(rlang)
filters <- quos(type=='b1',
             profit,
             Season=='Summer',
             profit==FALSE,
             yield >= 10,
             TRUE)

Then you can fairly easily map over these with purrr::map

library(dplyr)
library(purrr)
library(ggplot2)
map(filters, ~df %>% filter(!!!.x) %>% 
      ggplot(aes(x = temperature, y = yield)) + geom_point())



回答3:


Assume the input data in the Note at the end which fixes up some inconsistencies in the data shown in the question, makes temperature and yield numeric and improves profit == FALSE to just !profit. Define a function Plot which takes a filter, subsets df and plots it. Then apply it to each filter and use grid.arrange. This uses ggplot2 and gridExtra but no additional packages and does not use eval explicitly.

(An alternative to the grid.arrange line would be cowplot::plot_grid(plotlist=plots) which gives a slightly different layout.)

library(ggplot2)
library(gridExtra)

Plot <- function(x) {
  data <- do.call("subset", list(df, parse(text = x)))
  ggplot(data, aes(temperature, yield)) + geom_line() + geom_point() + ggtitle(x)
}

plots <- Map(Plot, filters)
do.call("grid.arrange", plots)

Note

df <- data.frame(brand = c("b1", "b2", "b1", "b2"),
                 yield = c(15, 10, 5, 0),
                 temperature = c(2, 21, 26, 13),
                 Season = c("Winter", "Summer", "Summer", "Autumn"),
                 profit = c(TRUE, TRUE, FALSE, FALSE))

filters <- c("brand=='b1'",
             "profit",
             "Season=='Summer'",
             "!profit",
             "yield >= 10",
             TRUE)


来源:https://stackoverflow.com/questions/59273929/r-how-to-filter-data-with-a-list-of-arguments-to-produce-multiple-data-frames

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!