hclust

Is it possible to run a clustering algorithm with chunked distance matrices?

痞子三分冷 提交于 2021-02-08 08:20:21
问题 I have a distance/dissimilarity matrix (30K rows 30K columns) that is calculated in a loop and stored in ROM. I would like to do clustering over the matrix. I import and cluster it as below: Mydata<-read.csv("Mydata.csv") Mydata<-as.dist(Mydata) Results<-hclust(Mydata) But when I convert the matrix to dist object, I get RAM limitation error. How can I handle it? Can I run hclust algorithm in a loop/chunking? I mean I divide the distance matrix into chunks and run them in a loop? 回答1: You may

Is it possible to run a clustering algorithm with chunked distance matrices?

戏子无情 提交于 2021-02-08 08:19:24
问题 I have a distance/dissimilarity matrix (30K rows 30K columns) that is calculated in a loop and stored in ROM. I would like to do clustering over the matrix. I import and cluster it as below: Mydata<-read.csv("Mydata.csv") Mydata<-as.dist(Mydata) Results<-hclust(Mydata) But when I convert the matrix to dist object, I get RAM limitation error. How can I handle it? Can I run hclust algorithm in a loop/chunking? I mean I divide the distance matrix into chunks and run them in a loop? 回答1: You may

How to convert data.frame into distance matrix for hierarchical clustering?

偶尔善良 提交于 2021-01-28 06:08:35
问题 I have a data frame defined in a format of distance matrix: > df DA DB DC DD DB 0.39 NA NA NA DC 0.44 0.35 NA NA DD 0.30 0.48 0.32 NA DE 0.50 0.80 0.91 0.7 I want to use it as distance matrix in hclust function. But when I try to convert it to a dist object, it changes: > as.dist(df) DB DC DD DC 0.44 DD 0.30 0.48 DE 0.50 0.80 0.91 You can see that DA is no longer part of the matrix. If I try to use df directly in hclust , it does not work: > hclust(d = df) Error in if (is.na(n) || n > 65536L)

Plotting hclust only to the cut clusters, not every leaf

守給你的承諾、 提交于 2021-01-28 00:08:00
问题 I have an hclust tree with nearly 2000 samples. I have cut it to an appropriate number of clusters and would like to plot the dendrogram but ending at the height that I cut the clusters rather than all the way to every individual leaf. Every plotting guide is about coloring all the leaves by cluster or drawing a box, but nothing seems to just leave the leaves below the cut line out completely. My full dendrogram looks like the following: I would like to plot it as if it stops where I've drawn

R cut dendrogram into groups with minimum size

无人久伴 提交于 2020-01-01 03:22:30
问题 Is there an easy way to calculate lowest value of h in cut that produces groupings of a given minimum size? In this example, if I wanted clusters with at least ten members each, I should go with h = 3.80 : # using iris data simply for reproducible example data(iris) d <- data.frame(scale(iris[,1:4])) hc <- hclust(dist(d)) plot(hc) cut(as.dendrogram(hc), h=3.79) # produces 5 groups; group 4 has 7 members cut(as.dendrogram(hc), h=3.80) # produces 4 groups; no group has <10 members Since the

hclust() with cutree…how to plot the cutree() cluster in single hclust()

冷暖自知 提交于 2019-12-24 15:21:40
问题 I clustered my hclust() tree into several groups with cutree(). Now I want a function to hclust() the several groupmembers as a hclust()... ALSO: I cut one tree into 168 groups and I want 168 hclust() trees... My data is a 1600*1600 matrix. My data is tooooo big so I give you an example m<-matrix(1:1600,nrow=40) #m<-as.matrix(m) // I know it isn't necessary here m_dist<-as.dist(m,diag = FALSE ) m_hclust<-hclust(m_dist, method= "average") plot(m_hclust) groups<- cutree(m_hclust, k=18) Now I

R - Isolate clusters with specific characteristics in hclust

橙三吉。 提交于 2019-12-24 12:11:36
问题 I've used hclust to generate a cluster dendrogram of some data, but I need to isolate all the paired clusters, i.e. all the clusters that comprise just 2 pieces of data (the first ones to be clustered together), even if they might be clustered with other data on a "higher" branch. Does anyone know how I can do that? I've highlighted the clusters I want to isolate in the attached image, hopefully that explains it better. I'd like to be able to isolate all the paired data in those clusters in

r: error for NbClust() call when deploying it within for() loop - “Error in if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + :”

徘徊边缘 提交于 2019-12-23 05:27:36
问题 I want to call the NbClust() function for a couple of dataframes. I do so by "sending" them all through a for loop that contains the NbClust() function call. The code looks like this: #combos of just all columns from df variations = unlist(lapply(seq_along(df), function(x) combn(df, x, simplify=FALSE)), recursive=FALSE) for(i in 1:length(variations)){ df = data.frame(variations[i]) nc = NbClust(scale(df), distance="euclidean", min.nc=2, max.nc=10, method="complete") } Unfortunately it always

Subsets of a dataset as separate dendrograms, but in the same plot

有些话、适合烂在心里 提交于 2019-12-23 03:37:16
问题 I know I can plot a dendrogram as follows library(cluster) d <- mtcars d[,8:11] <- lapply(d[,8:11], as.factor) gdist <- daisy(d, metric = c("gower"), stand = FALSE) dendro <- hclust(gdist, method = "average") plot(as.dendrogram(dendro)) However I have some groups identified (eg. by an iterative classification method), given as the last column in d G <- c(1,2,3,3,4,4,5,5,5,5,1,2,1,1,2,4,1,3,4,5,1,7,4,3,3,2,1,1,1,3,5,6) d$Group <- G head(d) mpg cyl disp hp drat wt qsec vs am gear carb Group

horizontal dendrogram in R with labels

对着背影说爱祢 提交于 2019-12-17 10:29:14
问题 I am trying to draw a dendrogram from the hclust function output. I hope the dendrogram is horizontally arranged instead of the default, which can be obtain by (for example) require(graphics) hc <- hclust(dist(USArrests), "ave") plot(hc) I tried to use as.dendrogram() function like plot(as.dendrogram(hc.poi),horiz=TRUE) but the result is without meaningful labels: If I use plot(hc.poi,labels=c(...)) which is without the as.dendrogram() , I can pass the labels= argument, but now the dendrogram