Color branches of dendrogram using an existing column

柔情痞子 提交于 2020-01-21 08:55:17

问题


I have a data frame which I am trying to cluster. I am using hclust right now. In my data frame, there is a FLAG column which I would like to color the dendrogram by. By the resulting picture, I am trying to figure out similarities among various FLAG categories. My data frame looks something like this:

FLAG    ColA    ColB    ColC    ColD

I am clustering on colA, colB, colC and colD. I would like to cluster these and color them according to FLAG categories. Ex - color red if 1, blue if 0 (I have only two categories). Right now I am using the vanilla version of cluster plotting.

hc<-hclust(dist(data[2:5]),method='complete')
plot(hc)

Any help in this regard would be highly appreciated.


回答1:


If you want to color the branches of a dendrogram based on a certain variable then the following code (largely taken from the help for the dendrapply function) should give the desired result:

x<-1:100
dim(x)<-c(10,10)
groups<-sample(c("red","blue"), 10, replace=TRUE)

x.clust<-as.dendrogram(hclust(dist(x)))

local({
  colLab <<- function(n) {
    if(is.leaf(n)) {
      a <- attributes(n)
      i <<- i+1
      attr(n, "edgePar") <-
        c(a$nodePar, list(col = mycols[i], lab.font= i%%3))
    }
    n
  }
  mycols <- groups
  i <- 0
})

x.clust.dend <- dendrapply(x.clust, colLab)
plot(x.clust.dend)



回答2:


I think Arhopala's answer is good. I took the liberty to take a step further, and added the function assign_values_to_leaves_edgePar to the dendextend package (starting from version 0.17.2, which is now on github). This version of the function is a bit more robust and flexible from Arhopala's answer since:

  1. It is a general function which can work in different problems/settings
  2. The function can deal with other edgePar parameters (col, lwd, lty)
  3. The function offers recycling of partial vectors, and various warnings massages when needed.

To install the dendextend package you can use install.packages('dendextend'), but for the latest version, use the following code:

require2 <- function (package, ...) {
    if (!require(package)) install.packages(package); library(package)
}

## require2('installr')
## install.Rtools() # run this if you are using Windows and don't have Rtools installed (you must have it for devtools)

# Load devtools:
require2("devtools")
devtools::install_github('talgalili/dendextend')

Now that we have dendextend installed, here is a second take on Arhopala's answer:

x<-1:100
dim(x)<-c(10,10)
set.seed(1)
groups<-sample(c("red","blue"), 10, replace=TRUE)
x.clust<-as.dendrogram(hclust(dist(x)))

x.clust.dend <- x.clust
x.clust.dend <- assign_values_to_leaves_edgePar(x.clust.dend, value = groups, edgePar = "col") # add the colors.
x.clust.dend <- assign_values_to_leaves_edgePar(x.clust.dend, value = 3, edgePar = "lwd") # make the lines thick
plot(x.clust.dend)

Here is the result:

p.s.: I personally prefer using pipes for this type of coding (which will give the same result as above, but is easier to read):

x.clust <- x %>% dist  %>% hclust %>% as.dendrogram
x.clust.dend <- x.clust %>% 
   assign_values_to_leaves_edgePar(value = groups, edgePar = "col") %>% # add the colors.
   assign_values_to_leaves_edgePar(value = 3, edgePar = "lwd") # make the lines thick
plot(x.clust.dend)


来源:https://stackoverflow.com/questions/23328663/color-branches-of-dendrogram-using-an-existing-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!