Significance level added to matrix correlation heatmap using ggplot2

前端未结

关注

 3  1985

野趣味 2020-12-12 17:37

I wonder how one can add another layer of important and needed complexity to a matrix correlation heatmap like for example the p value after the manner of the significance l

3条回答

情话喂你 (楼主)

2020-12-12 18:19

To signify significance along the estimated correlation coefficients you could vary the amount of coloring - either using alpha or by filling only a subset of each tile:

# install.packages("fdrtool")
# install.packages("data.table")
library(ggplot2)
library(data.table)

#download dataset
nba <- as.matrix(read.csv("http://datasets.flowingdata.com/ppg2008.csv")[-1])
m <- ncol(nba)
# compute corellation and p.values for all combinations of columns
dt <- CJ(i=seq_len(m), j=seq_len(m))[i





#use area
ggplot(dt, aes(x=i,y=j, fill=corr,  height=sqrt(1-lfdr),  width=sqrt(1-lfdr))) + 
  geom_tile()+
  scale_fill_distiller(palette = "RdYlGn", direction=1, limits=c(-1,1),name="Correlation") +
  scale_color_distiller(palette = "RdYlGn", direction=1, limits=c(-1,1),name="Correlation") +
  scale_x_continuous("variable", breaks = seq_len(m), labels = colnames(nba)) +
  scale_y_continuous("variable", breaks = seq_len(m), labels = colnames(nba), trans="reverse") +
  coord_fixed() +
  theme(axis.text.x=element_text(angle=90, vjust=0.5),
        panel.background=element_blank(),
        panel.grid.minor=element_blank(),
        panel.grid.major=element_blank(),
  )




Key here is the scaling of the p.values: In order to obtain easy-to-interpret values that show large variation only in relevant regions, I use estimates of upper bound for the local false discovery (lfdr) provided by fdrtools instead.
I.e, the alpha value of an tile is likely smaller or equal to the probability of that correlation to be different from 0.