axis.break and ggplot2 or gap.plot? plot may be too complexe

牧云@^-^@ 提交于 2019-12-20 01:43:16

问题


I created a plot with ggplot2. It's about milk protein content. I have two groups and 4 treatments. I want to show the interaction between group and treatment, means and errorbars. The protein content starts at 2.6%. Now my y-axis starts there without a gap, but my supervisor wants to have one. I tried axis.break() of the plotrix library, but nothing happened. I tried to rebuild the graphic with gap.plot but I was not successful, but I must admit that I'm no R-hero.

Here's the code for my graphic:

Protein<-ggplot(data=D, aes(x=treat, y=Prot,group=group, shape=group))+
  geom_line(aes(linetype=group), size=1, position=position_dodge(0.2))+
  geom_point(size=3, position=position_dodge(0.2))+
  geom_errorbar(aes(ymin=Prot-Prot_SD,ymax=Prot+Prot_SD), width=.2,      
position=position_dodge(0.2))+ 
  scale_shape_discrete(name='group\n', labels=c('1\n(n =   
22,19,16,20)\n','2\n(n = 15,12,14,12)'))+
  scale_linetype_discrete(name="group\n", labels=c('control\n(n =   
22,19,16,20)\n','free-contact\n(n = 15,12,14,12)'))+
  scale_x_discrete(labels=c('0', '1', '2', '3'))+
  labs(x='\ntreatment', y='protein content (%)\n')
ProtStar<-Protein+annotate("text", x=c(1,2,3,4), y=c(3.25,3.25,3.25,3.25),   
label=c("Aa","Aa","Ab","Ba"), size=4)
plot(ProtStar)

Unfortunately I do not have enough reputation to post images, but you might see from the code that the graphic is complex.

It would be fantastic if you would have useful suggestions. Thanks a lot!


回答1:


TL;DR: Look at the bottom.

Consider these figures:

ggplot(iris, aes(Species, Sepal.Length)) + geom_boxplot() + 
  theme_classic()

This is your basic plot. Now you have to consider the Y-axis.


ggplot(iris, aes(Species, Sepal.Length)) + geom_boxplot() + 
  theme_classic() +
  scale_y_continuous(limits = c(0,NA), expand = c(0,0))

This is the least misleading way of emphasizing that there is a zero floor to the data, even if there are no actual points below a certain value. Percent milk protein is a good example of data where negative values are impossible and you want to emphasize that, but that no observations were near zero.

This also shrinks the explanatory range of the Y axis, so that there's less difference between the observations. If this is something you want to emphasize, that can be good. But if the natural range of some data is narrow, including the zero (and the resulting empty space) is misleading. For example, if milk protein is always between 2.6% and 2.7%, then the zero value is not a true floor for the data, but just as impossible as -50%.


ggplot(iris, aes(Species, Sepal.Length)) + geom_boxplot() + 
  theme_classic() +
  scale_y_continuous(limits = c(0,NA), expand = c(0,0)) + 
  theme(axis.line.y = element_blank()) +
  annotate(geom = "segment", x = -Inf, xend = -Inf, y = -Inf, yend = Inf) 

There are many reasons not to include a broken Y axis. It's perceived by many as being unethical or misleading to include one inside ranges of data. But this particular case is at the outer limit, beyond the actual data. I think the rules can be bent a bit for that.

The first step is to remove the automatic Y axis line and draw it in "by hand" using annotate. Notice that the figure looks identical to the one previous. If your theme of choice uses a lot of different sizes, you're gonna have a bad time.


ggplot(iris, aes(Species, Sepal.Length)) + geom_boxplot() + 
  theme_classic() + 
  scale_y_continuous(limits = c(3.5,NA), expand = c(0,0), 
                     breaks = c(3.5, 4:7)) + 
  theme(axis.line.y = element_blank()) +
  annotate(geom = "segment", x = -Inf, xend = -Inf, y = -Inf, yend = Inf)

Now you can consider where the actual data begin and where is a good spot to put the break. You have to check by hand; e.g. min(iris$Sepal.Length) and consider where the tick marks will go. This is a personal judgment call.

I found that the lowest value was at 4.3. I knew I wanted the break to be below the minimum, and I wanted the break to be about 0.5 units long. So I chose to put a tick mark at 3.5, and then each integer afterwards with breaks = c(3.5, 4:7).


ggplot(iris, aes(Species, Sepal.Length)) + geom_boxplot() + 
  theme_classic() + 
  scale_y_continuous(limits = c(3.5,NA), expand = c(0,0), 
                     breaks = c(3.5, 4:7), labels = c(0, 4:7)) + 
  theme(axis.line.y = element_blank()) +
  annotate(geom = "segment", x = -Inf, xend = -Inf, y = -Inf, yend = Inf)

Now we need to relabel the 3.5 tick to be a fake zero with labels = c(0, 4:7).


ggplot(iris, aes(Species, Sepal.Length)) + geom_boxplot() + 
  theme_classic() + 
  scale_y_continuous(limits = c(3.5,NA), expand = c(0,0), 
                     breaks = c(3.5, 4:7), labels = c(0, 4:7)) + 
  theme(axis.line.y = element_blank()) +
  annotate(geom = "segment", x = -Inf, xend = -Inf, y = -Inf, yend = Inf) +
  annotate(geom = "segment", x = -Inf, xend = -Inf, y =  3.5, yend = 4,
           linetype = "dashed", color = "white")

Now we draw on a white dotted line over the manually-drawn axis line, going from our fake zero (y=3.5) to the lowest true tick mark (y=4).

Consider that the grammar of graphics is a mature philosophy; that is to say, each element has thoughtful reasoning behind it. The fact that this is finicky to do is for good reasons, and you need to consider whether your own reasons are sufficient weight on the other side.



来源:https://stackoverflow.com/questions/46403240/axis-break-and-ggplot2-or-gap-plot-plot-may-be-too-complexe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!