ggplots stored in plot list to respect variable values at time of plot generation within for loop

笑着哭i 提交于 2020-06-22 01:34:10

问题


I have an elaborate plot routine that generates box plots with additional layers of scatter and adds them to a plot list.

The routine generates correct plots if they are created during the for loop directly via print(current_plot_complete).

However, if they are added to a plot list during the for loop which is printed only at the end, then the plots are incorrect: the final indices are used to generate all plots (instead of the current index at the time the plot is generated). This seems to be default ggplot2 behavior and I am looking for a solution to circumvent it in the current use case.

The issue seems to be within y = eval(parse(text=(paste0(COL_i)))) where the global environment is used (and thus the final index value) instead of the current values at the time of loop execution.

I tried various approaches to make eval() use the correct variable values, e.g. local(…) or specifying the environment – but without success.

A very simplified MWE is provided below.

MWE

The original routine is much more elaborate than this MWE such that the for loop can not be replaced easily with members of the apply family.

# create some random data
data_temp <- data.frame(
"a" = sample(x = 1:100, size  = 50),
"b" = rnorm(n = 50, mean = 45, sd = 1),
"c" = sample(x = 20:70, size  = 50), 
"d" = rnorm(n = 50, mean = 40, sd = 15),
"e" = rnorm(n = 50, mean = 50, sd = 10),
"f" = rnorm(n = 50, mean = 45, sd = 1),
"g" = sample(x = 20:70, size  = 50)
)
COLs_current <- c("a", "b", "c", "d", "e") # define COLs of data to include in box plots
choice_COLs <- c("a", "d")      # define COLs of data to add scatter to

plot_list <- list(NA)
plot_index <- 1

for (COL_i in choice_COLs) {

  COL_i_index <- which(COL_i == COLs_current)

  # Generate "basis boxplot" (to plot scatterplot on top)
  boxplot_scores <- data_temp %>% 
    gather(COL, score, all_of(COLs_current)) %>%
    ggplot(aes(x = COL, y = score)) +
    geom_boxplot() 

  # Get relevant data of COL_i for scattering: data of 4th quartile
  quartile_values <- quantile(data_temp[[COL_i]])
  threshold <- quartile_values["75%"]           # threshold = 3. quartile value
  data_temp_filtered <- data_temp %>%
    filter(data_temp[[COL_i]] > threshold) %>%  # filter the data of the 4th quartile
    dplyr::select(COLs_current)                 

  # Create layer of scatter for 4th quartile of COL_i
  scatter_COL_i <- geom_point(data=data_temp_filtered, mapping = aes(x = COL_i_index, y = eval(parse(text=(paste0(COL_i))))), color= "orange")

  # add geom objects to create final plot for COL_i
  current_plot_complete <- boxplot_scores + scatter_COL_i 

  print(current_plot_complete)

  plot_list[[plot_index]] <- current_plot_complete 
  plot_index <- plot_index + 1
}

plot_list

回答1:


I propose this solution which doesn't tell you why it doesn't work like you do :

l <- lapply(choice_COLs, temporary_function)

temporary_function <- function(COL_i){
    COL_i_index <- which(COL_i == COLs_current)

    # Generate "basis boxplot" (to plot scatterplot on top)
    boxplot_scores <- data_temp %>% 
        gather(COL, score, all_of(COLs_current)) %>%
        ggplot(aes(x = COL, y = score)) +
        geom_boxplot() 

    # Get relevant data of COL_i for scattering: data of 4th quartile
    quartile_values <- quantile(data_temp[[COL_i]])
    threshold <- quartile_values["75%"]           # threshold = 3. quartile value
    data_temp_filtered <- data_temp %>%
        filter(data_temp[[COL_i]] > threshold) %>%  # filter the data of the 4th quartile
        dplyr::select(COLs_current)                 

    # Create layer of scatter for 4th quartile of COL_i
    scatter <- geom_point(data=data_temp_filtered,
                          mapping = aes(x = COL_i_index,
                                        y = eval(parse(text=(paste0(COL_i))))),
                          color= "orange")

    # add geom objects to create final plot for COL_i
    current_plot_complete <-  boxplot_scores + scatter

    return(current_plot_complete)
    }

When you use lapply you don't have such a problem. It is inspired by this post




回答2:


I think the problem is that ggplot uses lazy evaluation. When the list is rendered, the loop index has its final value, and that is the one used to render all the plots in the list.

This post is relevant.



来源:https://stackoverflow.com/questions/62423707/ggplots-stored-in-plot-list-to-respect-variable-values-at-time-of-plot-generatio

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!