cluster-analysis | 易学教程

How to create a heat map matrix and generate regions based 'heat' in Python?

阅读更多关于 How to create a heat map matrix and generate regions based 'heat' in Python?

问题 Given a set of points (x, y, 'heat'), In [15]: df.head() Out[15]: x y heat 0 0.660055 0.395942 2.368304 1 0.126268 0.187978 6.760261 2 0.174857 0.637188 1.025078 3 0.460085 0.759171 2.635334 4 0.689242 0.173868 4.845778 How to generate a heat map matrix and delimit heat regions (hard)? in such a way that, given a point, it is possible to get all points within the same region. PS: From Generate a heatmap in MatPlotLib using a scatter data set, I know how to generate graphs of regions, but not

Cutting dendrogram at highest level of purity

阅读更多关于 Cutting dendrogram at highest level of purity

问题 I am trying to create program that cluster documents using hierarchical agglomerative clustering, and the output of the program depends on cutting the dendrogram at such a level that I get maximum purity. So following is the algorithm I am working on right now. Create dedrogram for the documents in the dataset purity = 0 final_clusters for all the levels, lvl, in the dendrogram clusters = cut dendrogram at lvl new_purity = calculate_purity_of(clusters) if new_purity > purity purity = new

How to make R output text details about a dendrogram object?

阅读更多关于 How to make R output text details about a dendrogram object?

问题 Please see my previous question for details relating to test data and commands used to create a dendrogram: Using R to cluster based on euclidean distance and a complete linkage metric, too many vectors? Here is a quick summary of my commands to make the dendrogram: un_exprs <- as.matrix(read.table("sample.txt", header=TRUE, sep = "\t", row.names = 1, as.is=TRUE)) exprs <- t(un_exprs) eucl_dist=dist(exprs,method = 'euclidean') hie_clust=hclust(eucl_dist, method = 'complete')\ dend <- as

Change multi size icon cluster to single icon

阅读更多关于 Change multi size icon cluster to single icon

问题 pro. What this expression use for => this.sizes = [53, 56, 66, 78, 90];? I found it from markercluster.js. If I want to limit only 100 markers appear on map for every time the map load/ or onchange the zooming map, does it mean I need to change to => this.sizes = [100]? And how to change cluster icon outside ClusterMarker.js? Based on default, cluster icon will change according cluster size. How to make the cluster icon constant and without showing the number of total marker in it? Sorry

Single linkage clustering of edit distance matrix with distance threshold stopping criterion

阅读更多关于 Single linkage clustering of edit distance matrix with distance threshold stopping criterion

问题 I'm trying to assign flat, single-linkage clusters to sequence IDs separated by an edit distance < n, given a square distance matrix. I believe scipy.cluster.hierarchy.fclusterdata() with criterion='distance' may be a way to do this, but it isn't quite returning the clusters I'd expect for this toy example. Specifically, in the 4x4 distance matrix example below, I would expect clusters_50 (which uses t=50 ) to create 2 clusters, where actually it finds 3. I think the issue is that

How can I choose eps and minPts (two parameters for DBSCAN algorithm) for efficient results?

阅读更多关于 How can I choose eps and minPts (two parameters for DBSCAN algorithm) for efficient results?

问题 What routine or algorithm should I use to provide eps and minPts parameters to DBSCAN algorithm for efficient results? 回答1: The DBSCAN paper suggests to choose minPts based on the dimensionality, and eps based on the elbow in the k-distance graph. In the more recent publication Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN. ACM Transactions on Database Systems (TODS), 42(3), 19. the authors suggest

Label R dendrogram branches with correct group number

阅读更多关于 Label R dendrogram branches with correct group number

问题 I am trying to draw a dendrogram so that the labels on the branches match the group number from my cluster analysis. Currently the branches are simply labelled from left to right in the order that they appear, not the actual group number. Here is my current R code and resulting dendrogram: dst <- dist(Model_Results,method="binary") hca <- hclust(dst) clust <- cutree(hca,k=40) dend <-as.dendrogram(hca) library(dendextend) dend1 <- color_branches(dend, k = 40, groupLabels = TRUE) plot(dend1)

Clustering transactional data using PAM in R?

阅读更多关于 Clustering transactional data using PAM in R?

问题 I need to group sets of transactions in different groups. My data in a text file as this format: T1 17 20 22 35 37 60 62 T2 39 51 53 54 57 65 73 T3 17 20 21 22 34 37 62 T4 20 22 54 57 65 73 45 T5 20 54 57 65 73 75 80 T6 2 20 54 57 59 63 71 T7 2 20 22 57 59 71 66 T8 17 20 28 29 30 34 35 T9 16 20 28 32 54 57 65 T10 16 20 22 28 57 59 71 - - and so on, over 5000 lines. Each line represents one transaction. What I did so far: txIn<-read.transactions("data2.txt",format="basket",sep=" ") d<

“Cluster analysis” with MySQL

阅读更多关于 “Cluster analysis” with MySQL

问题 This is a tough one. There is probably a name for this and I don't know it, so I'll describe the problem exactly. I have a dataset including a number of user-submitted values. I need to be able to determine based on some sort of average, or better, a "closeness of data", which value is the correct value. For example, if I received the following three submissions from three users, 4, 10, 3, I would know that 3 or 4 would be the "correct" value in this case. If I were to average it out, I'd get

Python alternate way to find dendrogram

阅读更多关于 Python alternate way to find dendrogram

问题 I have data of dimension 8000x100. I need to cluster these 8000 items. I am more interested in the ordering of these items. I could get the desired result from the above code for small data but for higher dimension, I keep getting runtime error "RuntimeError: maximum recursion depth exceeded while getting the str of an object". Is there an alternate way to to get the reordered column from "Z". from hcluster import pdist, linkage, dendrogram import numpy from numpy.random import rand x = rand