Dendextend: Regarding how to color a dendrogram’s labels according to defined groups

六月ゝ 毕业季﹏ 提交于 2019-12-10 18:22:50

问题


I'm trying to use an awesome R-package named dendextend, to plot a dendrogram and color its branches & labels according to a set of previously defined groups. I've read your answers in Stack Overflow, and the FAQs of dendextend vignette, but I'm still not sure on how to achieve my goal.

Let's imagine I have a dataframe with a first column with the names of the individual to use for the clustering, then several columns with the factors to be analyzed, and the last column with the group information for each of the individuals (See following table).

individual  282856  282960  283275  283503  283572  283614  284015  group
pat15612    0   0   0   0   0   0   0   g2
pat38736    0   0   0   0   0   0   0   g2
pat38740    0   0   0   0   0   1   0   g2
pat38742    0   0   0   0   0   1   0   g4
pat38743    0   0   1   0   0   1   0   g3
pat38745    0   0   1   0   1   0   0   g4
pat38750    0   0   0   1   0   1   0   g4
pat38753    0   0   0   1   0   0   0   g3
pat40120    0   0   0   0   1   0   0   g4
pat40124    0   0   0   0   1   0   0   g4
pat40125    0   0   0   0   1   1   0   g4
pat40126    0   0   0   1   0   0   0   g4
pat40137    1   0   0   0   0   0   0   g4
pat40142    0   1   0   0   0   0   0   g5
pat46903    0   0   0   0   0   1   0   g1
pat67612    1   0   0   0   1   0   0   g1
pat67621    0   0   0   0   0   0   0   g2
pat67630    0   0   1   0   0   0   0   g2
pat67634    0   0   0   0   0   0   0   g5
pat67657    0   1   0   1   0   0   0   g5
pat67680    0   0   0   0   0   1   0   g5
pat67683    0   0   1   1   0   0   0   g6

How do I do to color the branches and labels representing each of the individuals based on the group they belong, even though they may cluster in different blocks?

In case this can be achieved, is there a way to define the colors assigned to each group?


回答1:


I'm glad you solved this on your own. The simpler solution is to use the order_value = TRUE argument in the set function. For example:

library(dendextend)
iris2 <- iris[,-5]
rownames(iris2) <- paste(iris[,5],iris[,5],iris[,5], rownames(iris2))
dend <- iris2 %>% dist %>% hclust %>% as.dendrogram
dend <- dend %>% set("labels_colors", as.numeric(iris[,5]), order_value = TRUE) %>%
        set("labels_cex", .5)
par(mar = c(4,1,0,8))
plot(dend, horiz = T)

Will result in (as you can see, the colors of the labels is based on the other variable "Species" in the iris dataset):

(p.s.: I tripled the number of times a species appears in order to make it easier to see how the color relates to the length of the label)




回答2:


I was able to do it using another package called "sparcl". I did it based on a previous post (How to colour the labels of a dendrogram by an additional factor variable in R).

Here is my code:

#load the dataset.....
#calculate distances
d <- dist(dataset2, method="Jaccard")
## Hierarchical cluster the data
hc <- hclust(d)
dend <- as.dendrogram(hc)
#create labels
labs=dataset$individual
#format to dendrogram
hcd = as.dendrogram(hc)                             
plot(hcd, cex=0.6)
# factor variable for colours                                  
Var = dataset$group   
# convert numbers to colours                                    
varCol = gsub("g1.*","green",Var)                        
varCol = gsub("g2.*","gold",varCol)
varCol = gsub("g3.*","pink",varCol)                        
varCol = gsub("g4.*","purple",varCol)
varCol = gsub("g5.*","blue",varCol)                        
varCol = gsub("g6.*","red",varCol)
#colour-code dendrogram branches by a factor 
library(sparcl)
ColorDendrogram(hc, y=varCol, branchlength=0.9, labels=labs,
            xlab="", ylab="", sub="")  

Finally, i managed to infere a "dendextend" package solution based on the example of this post (How to colour the labels of a dendrogram by an additional factor variable in R):

# install.packages("dendextend")
library(dendextend)

#load the dataset.....
dataset2<-dataset[,1:7]#same dataset as in the example

#calculate the dendrogram
dend <- as.dendrogram(hclust(dist(dataset2)))

#capture the colors from the "group" column
colors_to_use <- as.numeric(dataset$group)
colors_to_use

# sort the colors based on their order in dend:
colors_to_use <- colors_to_use[order.dendrogram(dend)]
colors_to_use

#Apply colors 
labels_colors(dend) <- colors_to_use

# Patient labels have a color based on their group
labels_colors(dend) 
plot(dend, main = "Color in labels")


来源:https://stackoverflow.com/questions/45217384/dendextend-regarding-how-to-color-a-dendrogram-s-labels-according-to-defined-gr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!