Significance level added to matrix correlation heatmap using ggplot2

前端 未结 3 1985
野趣味
野趣味 2020-12-12 17:37

I wonder how one can add another layer of important and needed complexity to a matrix correlation heatmap like for example the p value after the manner of the significance l

3条回答
  •  情话喂你
    2020-12-12 18:19

    To signify significance along the estimated correlation coefficients you could vary the amount of coloring - either using alpha or by filling only a subset of each tile:

    # install.packages("fdrtool")
    # install.packages("data.table")
    library(ggplot2)
    library(data.table)
    
    #download dataset
    nba <- as.matrix(read.csv("http://datasets.flowingdata.com/ppg2008.csv")[-1])
    m <- ncol(nba)
    # compute corellation and p.values for all combinations of columns
    dt <- CJ(i=seq_len(m), j=seq_len(m))[i

    #use area
    ggplot(dt, aes(x=i,y=j, fill=corr,  height=sqrt(1-lfdr),  width=sqrt(1-lfdr))) + 
      geom_tile()+
      scale_fill_distiller(palette = "RdYlGn", direction=1, limits=c(-1,1),name="Correlation") +
      scale_color_distiller(palette = "RdYlGn", direction=1, limits=c(-1,1),name="Correlation") +
      scale_x_continuous("variable", breaks = seq_len(m), labels = colnames(nba)) +
      scale_y_continuous("variable", breaks = seq_len(m), labels = colnames(nba), trans="reverse") +
      coord_fixed() +
      theme(axis.text.x=element_text(angle=90, vjust=0.5),
            panel.background=element_blank(),
            panel.grid.minor=element_blank(),
            panel.grid.major=element_blank(),
      )
    

    Key here is the scaling of the p.values: In order to obtain easy-to-interpret values that show large variation only in relevant regions, I use estimates of upper bound for the local false discovery (lfdr) provided by fdrtools instead. I.e, the alpha value of an tile is likely smaller or equal to the probability of that correlation to be different from 0.

提交回复
热议问题