Problems adapting the y-axis to 2x2 ANOVA bargraph using R and ggplot

早过忘川 提交于 2021-02-08 07:40:29

问题


I am not a Pro R user but I already tried multiple things and can't find a solution to the problem.

I created a bar graph for 2x2 ANOVA including error bars, APA theme and custom colors based on this website: https://sakaluk.wordpress.com/2015/08/27/6-make-it-pretty-plotting-2-way-interactions-with-ggplot2/ It works nicely but the y-axis starts at 0 although my scale only ranges from 1 - 7. I am trying to adapt the axis but I get strange errors.

This is what I did:

# see https://sakaluk.wordpress.com/2015/08/27/6-make-it-pretty-plotting-2-way-interactions-with-ggplot2/

interactionMeans(anova.2)
plot(interactionMeans(anova.2))

#using ggplot
install.packages("ggplot2")
library(ggplot2)

# create factors with value 

GIFTSTUDY1DATA$PRICE <- ifelse (Scenario == 3 | Scenario == 4, 1, -1 )
table(GIFTSTUDY1DATA$PRICE)
GIFTSTUDY1DATA$PRICE <- factor(GIFTSTUDY1DATA$PRICE, levels = c(-1, +1),
                                  labels = c("2 expensive", "1 inexpensive"))

GIFTSTUDY1DATA$AFFECT <- ifelse (Scenario == 1 | Scenario == 3, -1, +1 )
table(GIFTSTUDY1DATA$AFFECT)
GIFTSTUDY1DATA$AFFECT <- factor(GIFTSTUDY1DATA$AFFECT,
                                 levels = c(-1,1),
                                 labels = c("poor", "rich"))
# get descriptives

dat2 <- describeBy(EVALUATION,list(GIFTSTUDY1DATA$PRICE,GIFTSTUDY1DATA$AFFECT), 
                  mat=TRUE,digits=2)
dat2

names(dat2)[names(dat2) == 'group1'] = 'Price'
names(dat2)[names(dat2) == 'group2'] = 'Affect'

dat2$se = dat2$sd/sqrt(dat2$n)
# error bars +/- 1 SE
limits = aes(ymax = mean + se, ymin=mean - se)
dodge = position_dodge(width=0.9)

# set layout

apatheme=theme_light()+
  theme(panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),
        panel.border=element_blank(),
        axis.line=element_line(),
        text=element_text(family='Arial'))

#plot

p=ggplot(dat2, aes(x = Affect, y = mean, fill = Price))+
  geom_bar(stat='identity', position=dodge)+
  geom_errorbar(limits, position=dodge, width=0.15)+
  apatheme+
  ylab('mean gift evaluatoin')+
  scale_fill_manual(values=c("yellowgreen","skyblue4"))
p

Which gives me this figure:

https://i.stack.imgur.com/MwdVo.png

Now, if I try to change the y-axis using ylim or scale_y_continous

p + ylim(1,7)
p + scale_y_continuous(limits = c(1,7))

I get a graph with the y-axis as wanted but no bars and an error message stating

Removed 4 rows containing missing values (geom_bar).

https://i.stack.imgur.com/p66H8.png

Using

p + expand_limits(y=c(1,7))
p 

changes the upper end of the y-axis but still includes the zero!

What am I doing wrong? Do I have to start all over without using geom_bar? Thanks in advance.


回答1:


I have encountered a similar problem which was solved by replacing

scale_y_continuous(limits = c() with coord_cartesian(ylim = c())

I think this might work for you.

Here is an example:

library(tidyverse)

ggplot(mtcars,aes(factor(am),hp)) + 
   geom_bar(stat = "identity") + 
   coord_cartesian(ylim = c(1000,3000))

Also see link: Google R Discussion




回答2:


While Magnus Nordmo's answer is helpful, I would like to add the reason why ggplot2 behaves this way.

Consider the following plot (friendly reminder that geom_col() is shorthand for geom_bar(stat = "identity")):

df <- data.frame(x = letters[1:7],
                 y = 1:7)

g <- ggplot(df, aes(x, y)) +
  geom_col()
g

You can clearly see that the bars look like rectangles. Checking the underlying plot data, confirms that the bars are parameterised as rectangles with xmin/xmax/ymin/ymax parametrisation:

> layer_data(g)
  x y PANEL group ymin ymax xmin xmax colour   fill size linetype alpha
1 1 1     1     1    0    1 0.55 1.45     NA grey35  0.5        1    NA
2 2 2     1     2    0    2 1.55 2.45     NA grey35  0.5        1    NA
3 3 3     1     3    0    3 2.55 3.45     NA grey35  0.5        1    NA
4 4 4     1     4    0    4 3.55 4.45     NA grey35  0.5        1    NA
5 5 5     1     5    0    5 4.55 5.45     NA grey35  0.5        1    NA
6 6 6     1     6    0    6 5.55 6.45     NA grey35  0.5        1    NA
7 7 7     1     7    0    7 6.55 7.45     NA grey35  0.5        1    NA

Now consider the following plot:

g2 <- ggplot(df, aes(x, y)) +
  geom_col() +
  scale_y_continuous(limits = c(1, 7))

This one is empty, and reflects the case you have posted. Inspecting the underlying data yields the following:

> layer_data(g2)
  y x PANEL group ymin ymax xmin xmax colour   fill size linetype alpha
1 1 1     1     1   NA    1 0.55 1.45     NA grey35  0.5        1    NA
2 2 2     1     2   NA    2 1.55 2.45     NA grey35  0.5        1    NA
3 3 3     1     3   NA    3 2.55 3.45     NA grey35  0.5        1    NA
4 4 4     1     4   NA    4 3.55 4.45     NA grey35  0.5        1    NA
5 5 5     1     5   NA    5 4.55 5.45     NA grey35  0.5        1    NA
6 6 6     1     6   NA    6 5.55 6.45     NA grey35  0.5        1    NA
7 7 7     1     7   NA    7 6.55 7.45     NA grey35  0.5        1    NA

You can see that the ymin column is replaced by NAs. This behaviour depends on the oob (out-of-bounds) argument of scale_y_continuous(), which defaults to the scales::censor() function. This censors (replaces with NA) any values that are outside the axis limits, which includes the 0 which should be the ymin column. As a consequence, the rectangles can't be drawn.

There are two ways to work around this. One candidate is indeed as Magnus suggested to use the ylim argument in the coord_cartesian() function:

ggplot(df, aes(x, y)) +
  geom_col() +
  coord_cartesian(ylim = c(1, 7))

Specifying the limits inside a coord_* function causes the graphical objects to be clipped. You can see this in action when you turn the clipping off:

ggplot(df, aes(x, y)) +
  geom_col() +
  coord_cartesian(ylim = c(1, 7), clip = "off")

The other option is to use an alternative oob argument in the scale_y_continuous, for example scales::squish:

g3 <- ggplot(df, aes(x, y)) +
  geom_col() +
  scale_y_continuous(limits = c(1, 7), 
                     oob = scales::squish)
g3

What this does, is that it replaces any value outside the limits by the nearest limit, e.g. the ymin of 0 becomes 1:

> layer_data(g3)
  y x PANEL group ymin ymax xmin xmax colour   fill size linetype alpha
1 1 1     1     1    1    1 0.55 1.45     NA grey35  0.5        1    NA
2 2 2     1     2    1    2 1.55 2.45     NA grey35  0.5        1    NA
3 3 3     1     3    1    3 2.55 3.45     NA grey35  0.5        1    NA
4 4 4     1     4    1    4 3.55 4.45     NA grey35  0.5        1    NA
5 5 5     1     5    1    5 4.55 5.45     NA grey35  0.5        1    NA
6 6 6     1     6    1    6 5.55 6.45     NA grey35  0.5        1    NA
7 7 7     1     7    1    7 6.55 7.45     NA grey35  0.5        1    NA

Another thing you could do is provide a custom function to the oob argument, that simply returns it's input. Since by default, clipping is on, this reflects the coord_cartesian(ylim = c(1,7)) case:

ggplot(df, aes(x, y)) +
  geom_col() +
  scale_y_continuous(limits = c(1, 7), 
                     oob = function(x, ...){x})

I hope this clarified what is going on here.



来源:https://stackoverflow.com/questions/58954151/problems-adapting-the-y-axis-to-2x2-anova-bargraph-using-r-and-ggplot

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!