R ggplot2: boxplots with wilcoxon significance levels, and facets. Show only significant comparisons with asterisks

后端 未结 2 1268
南旧
南旧 2020-12-20 18:36

Following up on this question and for the sake of completeness, I modified the accepted answer and customized the resulting plot, but I am still facing some important proble

相关标签:
2条回答
  • 2020-12-20 18:51

    You can try following. As your code is really busy and for me too complicated to understand, I suggest a different approach. I tried to avoid loops and to use the tidyverse as much as possible. Thus, first I created your data. Then calculated kruskal wallis tests as this was not possible within ggsignif. Afterwards I will plot all p.values using geom_signif. Finally, insignificant ones will be removed and a step increase is added.

    1- Make coloring work done

    2- Show asterisks instead of numbers done

    ...and for the win:

    3- Make a common legend done

    4- Place Kruskal-Wallis line on top done, I placed the values at the bottom

    5- Change the size (and alignment) of the title and y axis text done

    library(tidyverse)
    library(ggsignif)
    
    # 1. your data
    set.seed(2)
    df <- as.tbl(iris) %>% 
      mutate(treatment=rep(c("A","B"), length(iris$Species)/2)) %>% 
      gather(key, value, -Species, -treatment) %>% 
      mutate(value=rnorm(n())) %>% 
      mutate(key=factor(key, levels=unique(key))) %>% 
      mutate(both=interaction(treatment, key, sep = " "))
    
    # 2. Kruskal test
    KW <- df %>% 
      group_by(Species) %>%
      summarise(p=round(kruskal.test(value ~ both)$p.value,2),
                y=min(value),
                x=1) %>% 
      mutate(y=min(y))
    
    # 3. Plot  
    P <- df %>% 
    ggplot(aes(x=both, y=value)) + 
      geom_boxplot(aes(fill=Species)) + 
      facet_grid(~Species) +
      ylim(-3,7)+
      theme(axis.text.x = element_text(angle=45, hjust=1)) +
      geom_signif(comparisons = combn(levels(df$both),2,simplify = F),
                  map_signif_level = T) +
      stat_summary(fun.y=mean, geom="point", shape=5, size=4) +
      xlab("") +
      geom_text(data=KW,aes(x, y=y, label=paste0("KW p=",p)),hjust=0) +
      ggtitle("Plot") + ylab("This is my own y-lab")
    
    # 4. remove not significant values and add step increase
    P_new <- ggplot_build(P)
    P_new$data[[2]] <- P_new$data[[2]] %>% 
      filter(annotation != "NS.") %>% 
      group_by(PANEL) %>%
      mutate(index=(as.numeric(group[drop=T])-1)*0.5) %>% 
      mutate(y=y+index,
             yend=yend+index) %>% 
      select(-index) %>% 
      as.data.frame()
    # the final plot  
    plot(ggplot_gtable(P_new))
    

    and similar approach using two facets

    # --------------------
    # 5. Kruskal
    KW <- df %>% 
      group_by(Species, treatment) %>%
      summarise(p=round(kruskal.test(value ~ both)$p.value,2),
                y=min(value),
                x=1) %>% 
      ungroup() %>% 
      mutate(y=min(y))
    
    
    # 6. Plot with two facets  
    P <- df %>% 
      ggplot(aes(x=key, y=value)) + 
      geom_boxplot(aes(fill=Species)) + 
      facet_grid(treatment~Species) +
      ylim(-5,7)+
      theme(axis.text.x = element_text(angle=45, hjust=1)) +
      geom_signif(comparisons = combn(levels(df$key),2,simplify = F),
                  map_signif_level = T) +
      stat_summary(fun.y=mean, geom="point", shape=5, size=4) +
      xlab("") +
      geom_text(data=KW,aes(x, y=y, label=paste0("KW p=",p)),hjust=0) +
      ggtitle("Plot") + ylab("This is my own y-lab")
    
    # 7. remove not significant values and add step increase
    P_new <- ggplot_build(P)
    P_new$data[[2]] <- P_new$data[[2]] %>% 
      filter(annotation != "NS.") %>% 
      group_by(PANEL) %>%
      mutate(index=(as.numeric(group[drop=T])-1)*0.5) %>% 
      mutate(y=y+index,
             yend=yend+index) %>% 
      select(-index) %>% 
      as.data.frame()
    # the final plot  
    plot(ggplot_gtable(P_new))
    

    Edit.

    Regarding to your p.adjust needs, you can set up a function on your own and calling it directly within geom_signif().

    wilcox.test.BH.adjusted <- function(x,y,n){
      tmp <- wilcox.test(x,y)
      tmp$p.value <- p.adjust(tmp$p.value, n = n,method = "BH")
      tmp
    }  
    
    geom_signif(comparisons = combn(levels(df$both),2,simplify = F),
              map_signif_level = T, test = "wilcox.test.BH.adjusted", 
              test.args = list(n=8))
    

    The challenge is to know how many independet tests you will have in the end. Then you can set the n by your own. Here I used 8. But this is maybe wrong.

    0 讨论(0)
  • 2020-12-20 19:02

    Constructing ggplots in a loop has always been known to produce confusing results, and for the explanation of point 1 I'll refer to this question and many others. There's also a hint there about evaluating the ggplot object on the spot, e.g. via print. Re point 2, you were close, a bit of debugging with trial and error helped. Here's the complete code for plot.list:

    plot.list=function(mydf, pv.final, addkw, a, myPal){
        mylist <- list()
        i <- 0
        for (sp in unique(mydf$Species)){
            i <- i+1
            mydf0 <- subset(mydf, Species==sp)
            addkw0 <- subset(addkw, Species==sp)
            pv.final0 <- pv.final[grep(sp, pv.final$group1), ]
            num.signif <- sum(pv.final0$p.value <= 0.05)
            P <- ggplot(mydf0,aes(x=both, y=value)) +
                geom_boxplot(aes(fill=Species)) +
                stat_summary(fun.y=mean, geom="point", shape=5, size=4) +
                facet_grid(~Species, scales="free", space="free_x") +
                scale_fill_manual(values=myPal[i]) +
                geom_text(data=addkw0, hjust=0, size=4.5, aes(x=0, y=round(max(mydf0$value, na.rm=TRUE)+0.5), label=paste0("KW p=",p.value))) +
                geom_signif(test="wilcox.test", comparisons = a[which(pv.final0$p.value<=0.05)],#I can use "a"here
                            map_signif_level = F,            
                            vjust=0,
                            textsize=4,
                            size=0.5,
                            step_increase = 0.05)
            if (i==1){
                P <- P + theme(legend.position="none",
                               axis.text.x=element_text(size=20, angle=90, hjust=1),
                               axis.text.y=element_text(size=20),
                               axis.title=element_blank(),
                               strip.text.x=element_text(size=20,face="bold"),
                               strip.text.y=element_text(size=20,face="bold"))
            } else{
                P <- P + theme(legend.position="none",
                               axis.text.x=element_text(size=20, angle=90, hjust=1),
                               axis.text.y=element_blank(),
                               axis.ticks.y=element_blank(),
                               axis.title=element_blank(),
                               strip.text.x=element_text(size=20,face="bold"),
                               strip.text.y=element_text(size=20,face="bold"))
            }
            P2 <- ggplot_build(P)
            P2$data[[4]]$annotation <- rep(subset(pv.final0, p.value<=0.05)$map.signif, each=3)
            P <- ggplot_gtable(P2)
            mylist[[sp]] <- list(num.signif, P)
        }
        return(mylist)
    }
    

    Note that we can no longer modify the plot via ggplot semantics, since we already applied ggplot_build/ggplot_gtable, so scale modification is no longer possible. If you want to preserve it, move it inside the plot.list function. So, changing to

    grid.arrange(grobs=lapply(p.list, function(x) x[[2]]), 
                 ncol=length(unique(mydf$Species)), top="Random title", left="Value")
    

    yields

    That's not a complete solution, of course, but I hope that helps.

    0 讨论(0)
提交回复
热议问题