for loop with ggplots produces graphs with identical values but different headings

情到浓时终转凉″ 提交于 2019-11-29 11:22:59
baptiste

There are two standard ways to deal with this problem:

1- Work with a long-format data.frame

2- Use aes_string to refer to variable names in the wide format data.frame

Here's an illustration of possible strategies.

library(ggplot2)
library(gridExtra)

# data from other answer
df <- data.frame(group=c(rep("A", 4), rep("B", 4)),
                 a=sample(1:100, 8),
                 b=sample(100:200, 8),
                 c=sample(300:400, 8))

## first method: long format
m <- reshape2::melt(df, id = "group")
p <- ggplot(m, aes(x=group, y=value)) +
    geom_boxplot() 

pl <- plyr::dlply(m, "variable", function(.d) p %+% .d + ggtitle(unique(.d$variable)))
grid.arrange(grobs=pl)

## second method: keep wide format
one_plot <- function(col = "a")  ggplot(df, aes_string(x="group", y=col)) +  geom_boxplot() + ggtitle(col)
pl <- plyr::llply(colnames(df)[-1], one_plot)
grid.arrange(grobs=pl)

## third method: more explicit looping

pl <- vector("list", length = ncol(df)-1)
for(ii in seq_along(pl)){
  .col <- colnames(df)[-1][ii]
  .p <- ggplot(df, aes_string(x="group", y=.col)) +  geom_boxplot() + ggtitle(.col)
  pl[[ii]] <- .p
}

grid.arrange(grobs=pl)

Sometimes, when wrapping a ggplot call inside a function/for loop one faces issues with local variables (not the case here, if aes_string is used). In such cases one can define a local environment.

Note that using a construct like aes(y=df[,i]) may appear to work, but can produce very wrong results. Consider a facetted plot, the data.frame will be split into different groups for each panel, and this subsetting can fail miserably to group the right data if numeric values are passed directly to aes() instead of variable names.

I have cleaned up how you generated your sample data frame.

library(ggplot2)
library(cowplot)

df <- data.frame(group=c(rep("A", 4), rep("B", 4)),
                          a=sample(1:100, 8),
                          b=sample(100:200, 8),
                          c=sample(300:400, 8)) #make data frame

Just using data.frame() will suffice. This makes your code clearer and avoids the need for all that post-processing in your 'for loop' to convert your dataframe to numeric and to remove the factors generated - Note that as.data.frame() and cbind() tend to default to factors if you don't have 'stringsAsFactors = FALSE' and that the numeric to character conversion can be avoided by using cbind.data.frame() rather than cbind().

I have also refactored your 'for loop' that generates your plots. You generate a list of integers called 'cols' (cols <- 2:4 ) which you then reiterate across to generate your plots from each column of data. This is unnecessary, we can just create a range in the for statement conditions - 'for (i in 2:ncol(df))' - this simply reiterates from 2 to 4 (the number of columns in your dataframe) - starting from 2 is required to avoid column 1 which contains metadata. This is preferable because:

i) When reviewing your code the condition used is immediately apparent without searching through the rest of your code

ii) R has a number of functions/parameters similarly named to your variable 'cols' and it is best to avoid confusion.

With the code cleaned up we can now try to locate the cause of the bug:

library(ggplot2)
library(cowplot)

df <- data.frame(group=c(rep("A", 4), rep("B", 4)),
                          a=sample(1:100, 8),
                          b=sample(100:200, 8),
                          c=sample(300:400, 8)) #make data frame


for (i in 2:ncol(df)){

  g <- ggplot(df, aes(x=group, y=df[,i])) +
    geom_boxplot() +
    ggtitle(colnames(df)[i])

  print(g)
  assign(colnames(df)[i], g) #generate an object for each plot
}   

It's not immediately obvious why your code doesn't work. The suggestion by Imo has merit. Saving your plots to a list would prevent your environment from getting cluttered with objects, however it would not solve this bug. The cause is unintuitive and requires a deep understanding about how the assign() function is evaluated. See the answer provided here by Konrad Rudolph. The following should work and retains the style of your original code. As Konrad suggests in his answer it might be more "R" like to use lapply. Note that we have given the for loop local scope and that we now re-define i locally. Previously the last value of i generated in the loop was being used to generate each object created via the assign() function. Note the use of <<- to assign g to the global environment.

for (i in 2:ncol(df))  
     local({
  i <- i
  g <<- ggplot(df, aes(x=group, y=df[,i])) +
    geom_boxplot() +
    ggtitle(colnames(df)[i])
  print(i)
  print(g)
  assign(colnames(df)[i], g, pos =1) #generate an object for each plot
     })

plot_grid(a, b, c)

You owe me a drink.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!