问题
I have a dataframe df with many columns ...
I'd like plot of subset of columns where c is a list of the columns I'd like to plot.
I'm currently doing the following
df <-structure(list(Image.Name = structure(1:5, .Label = c("D1C1", "D2C2", "D4C1", "D5C3", "D6C2"), class = "factor"), Experiment = structure(1:5, .Label = c("020718 perfusion EPC_BC_HCT115_Day 5", "020718 perfusion EPC_BC_HCT115_Day 6", "020718 perfusion EPC_BC_HCT115_Day 7", "020718 perfusion EPC_BC_HCT115_Day 8", "020718 perfusion EPC_BC_HCT115_Day 9"), class = "factor"), Type = structure(c(2L, 1L, 1L, 2L, 1L), .Label = c("VMO", "VMT"), class = "factor"), Date = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "18-Apr-18", class = "factor"), Time = structure(1:5, .Label = c("12:42:02 PM", "12:42:29 PM", "12:42:53 PM", "12:43:44 PM", "12:44:23 PM"), class = "factor"), Low.Threshold = c(10L, 10L, 10L, 10L, 10L), High.Threshold = c(255L, 255L, 255L, 255L, 255L), Vessel.Thickness = c(7L, 7L, 7L, 7L, 7L), Small.Particles = c(0L, 0L, 0L, 0L, 0L), Fill.Holes = c(0L, 0L, 0L, 0L, 0L), Scaling.factor = c(0.001333333, 0.001333333, 0.001333333, 0.001333333, 0.001333333), X = c(NA, NA, NA, NA, NA), Explant.area = c(1.465629333, 1.093447111, 1.014612444, 1.166950222, 1.262710222), Vessels.area = c(0.255562667, 0.185208889, 0.195792, 0.153907556, 0.227996444), Vessels.percentage.area = c(17.43706003, 16.93807474, 19.29722044, 13.18887067, 18.05611774), Total.Number.of.Junctions = c(56L, 32L, 39L, 18L, 46L), Junctions.density = c(38.20884225, 29.26524719, 38.43832215, 15.42482246, 36.42957758), Total.Vessels.Length = c(12.19494843, 9.545333135, 10.2007416, 7.686755647, 11.94211976), Average.Vessels.Length = c(0.182014156, 0.153956986, 0.188902622, 0.08938088, 0.183724919), Total.Number.of.End.Points = c(187L, 153L, 145L, 188L, 167L), Average.Lacunarity = c(0.722820111, 0.919723402, 0.86403871, 1.115896082, 0.821753818)), .Names = c("Image.Name", "Experiment", "Type", "Date", "Time", "Low.Threshold", "High.Threshold", "Vessel.Thickness", "Small.Particles", "Fill.Holes", "Scaling.factor", "X", "Explant.area", "Vessels.area", "Vessels.percentage.area", "Total.Number.of.Junctions", "Junctions.density", "Total.Vessels.Length", "Average.Vessels.Length", "Total.Number.of.End.Points", "Average.Lacunarity"), row.names = c(NA, -5L), class = "data.frame")
doBarPlot <- function(x) {
p <- ggplot(x, aes_string(x="Type", y=colnames(x), fill="Type") ) +
stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width=0.5, na.rm = TRUE) +
ggtitle("VMO vs. VMT") +
theme(plot.title = element_text(hjust = 0.5) )
print(p)
ggsave(sprintf("plots/%s_bars.pdf", colnames(x) ) )
return(p)
}
c = c('Total.Vessels.Length', 'Total.Number.of.Junctions', 'Total.Number.of.End.Points', 'Average.Lacunarity')
p[c] <- lapply(df[c], doBarPlot)
However this yields the following error :
Error: ggplot2 doesn't know how to deal with data of class numeric
Debugging shows that x inside of doBarPlot is of the type numeric rather than data.frame, so ggplot errors. However, test <- df2[c] yields a variable of the type data.frame.
Why is x a numeric?
What's the best way to apply doBarPlot without resorting to a loop?
回答1:
As others have noted, the problem with your initial approach is that when you use lapply on a data frame, the elements that you are iterating over will be the column vectors, rather than 1-column data frames. However, even if you did iterate over 1-column data frames, your function would fail: the data frame supplied to the ggplot call wouldn't contain the Type column that you use in the plot.
Instead, you could modify the function to take two arguments: the full data frame, and the name of the column that you want to use on the y-axis.
doBarPlot <- function(data, y) {
p <- ggplot(data, aes_string(x = "Type", y = y, fill = "Type")) +
stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
stat_summary(
fun.data = "mean_cl_normal",
geom = "errorbar",
width = 0.5,
na.rm = TRUE
) +
ggtitle("VMO vs. VMT") +
theme(plot.title = element_text(hjust = 0.5))
print(p)
ggsave(sprintf("plots/%s_bars.pdf", y))
return(p)
}
Then, you can use lapply to iterate over the character vector of columns you want to plot, while supplyig the data frame via the ... as a fixed argument to your plotting function:
library(ggplot2)
cols <- c('Total.Vessels.Length', 'Total.Number.of.Junctions',
'Total.Number.of.End.Points', 'Average.Lacunarity')
p <- lapply(cols, doBarPlot, data = df)
Further, if you don't mind having all of the plots in one file, you could also use tidyr::gather to reshape your data into long form, and use facet_wrap in your plot (as suggested by @RichardTelford in his comment), avoiding the iteration and the need for a function altogether:
library(tidyverse)
df %>%
gather(variable, value, cols) %>%
ggplot(aes(x = Type, y = value, fill = Type)) +
facet_wrap(~ variable, scales = "free_y") +
stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
stat_summary(
fun.data = "mean_cl_normal",
geom = "errorbar",
width = 0.5,
na.rm = TRUE
) +
ggtitle("VMO vs. VMT") +
theme(plot.title = element_text(hjust = 0.5))

回答2:
The apply family of functions vectorise the objected passed. A simple example to illustrate this:
lapply(mtcars, function(x) print(x))
With your code, you are passing a vector of each column in your df to the function doBarPlot. The ggplot2 package works with dataframes, not lists or vectors and therefore you get the error.
If you want to use your function, apply it directly to the subsetted df:
doBarPlot(df[ , c])
If you have a bunch of dataframes and you want to subset by the columns in c checkout this answer:
How to apply same function to every specified column in a data.table
or alternatively, look into the dplyr::select()
来源:https://stackoverflow.com/questions/50227769/using-apply-functions-with-ggplot-to-plot-a-subset-of-dataframe-columns