Heatmap of categorical variable counts

谁说我不能喝 提交于 2019-12-24 14:05:45

问题


I have a data frame of items, and each has multiple classifier columns that are categorical variables.

ID    test1    test2     test3
1     A        B         A
2     B        A         C
3     C        C         C
4     A        A         B
5     B        B         B
6     B        A         C

I want to generate a heatmap for each combination of test columns (test1 v test2, test1 v test3, etc.) using ggplot2. The heatmap would have all factors in that test's column (in this case A,B,C) on the x-side and all factors of the other test on the y-side, and the boxes in the heatmap should be colored based on the count of ids that have that combination of classifier.

For example in the above input, if we have heatmap between test1 and test2, then the box that is in the intersection of B for test1 and A for test2 would be brightest, since there are 2 ids with that combination. I hope to use these heatmaps to analyze which tests are most congruent for the data set, but can't use a Pearson's R correlation since they are categorical variables.

I am familiar with ggplot, which is why I prefer that package, but if it is easier in pheatplot, I am okay with learning that.


回答1:


Took some time to realize how to do it, and still I am not sure it is the best way.

Data:
dat = structure(list(ID = 1:6, 
                     test1 = c("A", "B", "C", "A", "B", "B"), 
                     test2 = c("B", "A", "C", "A", "B", "A"), 
                     test3 = c("A", "C", "C", "B", "B", "C")
                     ), 
                .Names = c("ID", "test1", "test2", "test3"), 
                 class = "data.frame", row.names = c(NA, -6L)
                )
Libraries
library(tidyverse)
library(ggthemes)
library(gridExtra)
Create all all combinations of factors (also tests) taken 2 at a time
fcombs <- expand.grid(LETTERS[1:3], LETTERS[1:3], stringsAsFactors = F)
tcombs <- as.data.frame(combn(colnames(dat[,-1]), 2), stringsAsFactors = F)
lapply through the tests combinations, full_join, count length of each group excluding NAs
dtl <- lapply(tcombs, function(i){
        select(dat, ID, i) %>%
        full_join(x = fcombs, by = c("Var1" = i[1], Var2 = i[2])) %>%
        group_by(Var1, Var2) %>%
        mutate(N = sum(!is.na(ID)), ID = NULL) %>%
        ungroup()
  }
)
Create a list of plots
pl <- lapply(seq_along(tcombs), function(i){
        gtitle = paste(tcombs[[i]], collapse = " ~ ")
        dtl[[i]] %>%
        ggplot(aes(x = Var1, y = Var2, fill = N)) +
        geom_tile() +
        theme_tufte() +
        theme(axis.title = element_blank()) +
        ggtitle(gtitle)
        }
      )
Create list of tables (tableGrob objects)
tbl <- lapply(tcombs, function(i) tableGrob(select(dat, ID, i),  
                                            theme = ttheme_minimal()))
Put everything into the resulting list and plot
resl <- c(pl, tbl)[c(1, 4, 2, 5, 3, 6)]

grid.arrange(grobs = resl, ncol = 2, nrow = 3)




回答2:


Your question is bit unclear, but I think you are looking for something like this. I am not a ggplot2 person so I will let someone else provide that code.

x <- read.table(text="ID    test1    test2     test3
                1     A        B         A
                2     B        A         C
                3     C        C         C
                4     A        A         B
                5     B        B         B
                6     B        A         C", stringsAsFactors=FALSE, header=T)

xl <- reshape2::melt(data = x, id.vars="ID", variable.name = "Test", value.name="Grade")
xl$Test_Gr <- apply(xl[,2:3], 1, paste0, collapse="_")

xw <- reshape2::dcast(xl, ID ~ Test_Gr, fun.aggregate = length)
xwm <- as.matrix(xw[,-1])
xc <- t(xwm) %*% xwm
colnames(xc) <- colnames(xw)[-1]
rownames(wc) <- colnames(xw)[-1]
gplots::heatmap.2(xc, trace="none", col = rev(heat.colors(15)))



来源:https://stackoverflow.com/questions/51028547/heatmap-of-categorical-variable-counts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!