I have a data frame which contains x-axis numeric bins and continuous y-axis data across multiple categories. Initially, I created a boxplot by making the x-axis bins "factors", and doing a boxplot of the melted data. Reproducible data:
x <- seq(1,10,by=1)
y1 <- rnorm(10, mean=3)
y2 <- rnorm(10, mean=10)
y3<- rnorm(10, mean=1)
y4<- rnorm(10, mean=8)
y5<- rnorm(10, mean=12)
df <- data.frame(x,y1,y2,y3,y4,y5)
df.m <- melt(df, id="x")
My code to create the x-axis data as a factor:
df.m$x <- as.factor(df.m$x)
My ggplot:
ggplot(df.m, aes(x=x, y=value))+
geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.1)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
The resulting plot:
:
The problem is that I cannot use x-axis numeric spacing because the x-axis is categorized as a factor, which has equal spacing. I want to be able to use something like scale_x_continuous to manipulate the axis breaks and spacing to, say, an interval of 2, rather than a boxplot every 1, but when I try to plot the data with the x-axis "as.numeric", I just get one boxplot of all of the data:
Any suggestions for a way to get this continuous-looking boxplot curve (the first image) while still being able to control the numeric properties of the x-axis? Thanks!
Here is a way using the original data you posted on Google - which actually was much more helpful, IMO.
ggplot(df, aes(x=CH, y=value,group=CH))+
geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.2)+
scale_x_log10()
So, as @BenBolker said before he deleted his answer(??), you should leave the x-variable (CH) as numeric, and set group=CH in the call to aes(...).
With your real data there is another problem though. Your CH is more or less logarithmically spaced, so there are about as many points < 1 as there are between 1 - 10, etc. ggplot wants to make the boxes all the same size, so with a linear x-axis the box width is smaller than the line width, and you don't see the boxes at all. Changing the x-axis to a logarithmic scale fixes that, more or less.
Don't make x a factor. You need to aesthetically map a group that is a factor determining which box the value is associated with, luckily, after melting, this is what you variable column is:
ggplot(df.m, aes(x = x, y = value, group = variable)) +
geom_boxplot()
As x is still numeric, you can give it whatever values you want within a specific variable level and the boxplot will show up at that spot. Or you could transform the x axis, etc.
来源:https://stackoverflow.com/questions/27050802/how-to-create-geom-boxplot-with-large-amount-of-continuous-x-variables