Cutting dendrogram into n trees with minimum cluster size in R

泄露秘密 提交于 2019-12-03 15:04:55

1- Figure out the most appropriate dissimilarity measure (e.g., "euclidean", "maximum", "manhattan", "canberra", "binary", or "minkowski") and linkage method (e.g., "ward", "single", "complete", "average", "mcquitty", "median", or "centroid") based on the nature of your data and the objective(s) of clustering. See ?dist and ?hclust for more details.

2- Plot the dendogram tree before starting the cutting step. See ?hclust for more details.

3- Use the hybrid adaptive tree cut method in dynamicTreeCut package, and tune the shape parameters (maxCoreScatter and minGap / maxAbsCoreScatter and minAbsGap). See Langfelder et al. 2009 (http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/BranchCutting/Supplement.pdf).


For your example,

1- Change "euclidean" and/or "complete" methods as appropriate,

orangeClust <- hclust(dist(orangeCorr, method="euclidean"), method="complete")

2- Plot dendogram,

plot(orangeClust)

3- Use the hybrid tree cut method and tune shape parameters,

dynamicCut <- cutreeDynamic(orangeClust, minClusterSize=3, method="hybrid", distM=as.matrix(dist(orangeCorr, method="euclidean")), deepSplit=4, maxCoreScatter=NULL, minGap=NULL, maxAbsCoreScatter=NULL, minAbsGap=NULL)
dynamicCut
 ..cutHeight not given, setting it to 1.8  ===>  99% of the (truncated) height range in dendro.
 ..done.
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0

As a guide for tuning the shape parameters, the default values are

deepSplit=0: maxCoreScatter = 0.64 & minGap = (1 - maxCoreScatter) * 3/4
deepSplit=1: maxCoreScatter = 0.73 & minGap = (1 - maxCoreScatter) * 3/4
deepSplit=2: maxCoreScatter = 0.82 & minGap = (1 - maxCoreScatter) * 3/4
deepSplit=3: maxCoreScatter = 0.91 & minGap = (1 - maxCoreScatter) * 3/4
deepSplit=4: maxCoreScatter = 0.95 & minGap = (1 - maxCoreScatter) * 3/4

As you can see, both maxCoreScatter and minGap should be between 0 and 1, and increasing maxCoreScatter (decreasing minGap) increases the number of clusters (with smaller sizes). The meaning of these parameters is described in Langfelder et al. 2009.

For example, to get more smaller clusters

maxCoreScatter <- 0.99
minGap <- (1 - maxCoreScatter) * 3/4
dynamicCut <- cutreeDynamic(orangeClust, minClusterSize=3, method="hybrid", distM=as.matrix(dist(orangeCorr, method="euclidean")), deepSplit=4, maxCoreScatter=maxCoreScatter, minGap=minGap, maxAbsCoreScatter=NULL, minAbsGap=NULL)
dynamicCut
 ..cutHeight not given, setting it to 1.8  ===>  99% of the (truncated) height range in dendro.
 ..done.
 2 3 2 2 2 3 3 2 2 3 3 2 2 2 1 2 1 1 1 2 2 1 1 2 2 1 1 1 0 0

Finally, your clustering constraints (size, height, number, ... etc) should be reasonable and interpretable, and the generated clusters should agree with the data. This guides you to the important step of clustering validation and interpretation.


Good Luck!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!