cluster-analysis | 易学教程

How to cluster points and plot

阅读更多关于 How to cluster points and plot

问题 I am trying to use clustering in R. I am a rookie and havent worked much with R. I have the geo location points as latitude and longitude values. What I am looking to do is to find out the hotspots using this data. I am looking to create clusters of 4 or more points that are 600 feet apart. I want to get the centroids of such clusters and plot them. The data looks like this: LATITUDE LONGITUD 32.70132 -85.52518 34.74251 -86.88351 32.55205 -87.34777 32.64144 -85.35430 34.92803 -87.81506 32

Remove for loop from clustering algorithm in MATLAB

阅读更多关于 Remove for loop from clustering algorithm in MATLAB

问题 I am trying to improve the performance of the OPTICS clustering algorithm. The implementation i've found in open source makes a use of a for loop for each sample and can run for hours... I believe some use of repmat() function may aid in improving its performance when the system has enough amount of RAM. You are more than welcome to suggest other ways of improving the implementation. Here is the code: x is the data: a [mxn] array where m is the sample size and n is the feature dimensionality,

How do I create a simliarity matrix in MATLAB?

阅读更多关于 How do I create a simliarity matrix in MATLAB?

问题 I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian distance. I then want to create a matrix over which I can execute multiple random walks. Right now, my code is as follows: % clear % clc % close all % % load tea.mat; images = Input.X; M = zeros(size(images, 2), size (images, 2)); for i = 1:size(images, 2) for j = 1:size(images, 2) normImageTemp =

How to know which cluster do the new data belongs to after finishing cluster analysis

阅读更多关于 How to know which cluster do the new data belongs to after finishing cluster analysis

问题 After finishing cluster analysis,when I input some new data,how Do I know which cluster do the data belongs to? data(freeny) library(RSNNS) options(digits=2) year<-as.integer(rownames(freeny)) freeny<-cbind(freeny,year) freeny = freeny[sample(1:nrow(freeny),length(1:nrow(freeny))),1:ncol(freeny)] freenyValues= freeny[,1:5] freenyTargets=decodeClassLabels(freeny[,6]) freeny = splitForTrainingAndTest(freenyValues,freenyTargets,ratio=0.15) km<-kmeans(freeny$inputsTrain,10,iter.max = 100) kclust

Calculating the percentage of variance measure for k-means?

阅读更多关于 Calculating the percentage of variance measure for k-means?

问题 On the Wikipedia page, an elbow method is described for determining the number of clusters in k-means. The built-in method of scipy provides an implementation but I am not sure I understand how the distortion as they call it, is calculated. More precisely, if you graph the percentage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in

Approaches for spatial geodesic latitude longitude clustering in R with geodesic or great circle distances

阅读更多关于 Approaches for spatial geodesic latitude longitude clustering in R with geodesic or great circle distances

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . I would like to apply some basic clustering techniques to some latitude and longitude coordinates. Something along the lines of clustering (or some unsupervised learning) the coordinates into groups determined either by their great circle distance or their geodesic distance. NOTE: this could be a very poor approach, so please advise. Ideally, I would like to tackle this in R .

Difference between classification and clustering in data mining? [closed]

阅读更多关于 Difference between classification and clustering in data mining? [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 6 months ago . Can someone explain what the difference is between classification and clustering in data mining? If you can, please give examples of both to understand the main idea. 回答1: In general, in classification you have a set of predefined classes and want to know which class a new object

Scikit Learn GridSearchCV without cross validation (unsupervised learning)

阅读更多关于 Scikit Learn GridSearchCV without cross validation (unsupervised learning)

问题 Is it possible to use GridSearchCV without cross validation? I am trying to optimize the number of clusters in KMeans clustering via grid search, and thus I don't need or want cross validation. The documentation is also confusing me because under the fit() method, it has an option for unsupervised learning (says to use None for unsupervised learning). But if you want to do unsupervised learning, you need to do it without cross validation and there appears to be no option to get rid of cross

Clustering values by their proximity in python (machine learning?) [duplicate]

阅读更多关于 Clustering values by their proximity in python (machine learning?) [duplicate]

问题 This question already has answers here : Cluster one-dimensional data optimally? [closed] (1 answer) 1D Number Array Clustering (2 answers) Closed 6 years ago . I have an algorithm that is running on a set of objects. This algorithm produces a score value that dictates the differences between the elements in the set. The sorted output is something like this: [1,1,5,6,1,5,10,22,23,23,50,51,51,52,100,112,130,500,512,600,12000,12230] If you lay these values down on a spreadsheet you see that

Text clustering with Levenshtein distances

阅读更多关于 Text clustering with Levenshtein distances

问题 I have a set (2k - 4k) of small strings (3-6 characters) and I want to cluster them. Since I use strings, previous answers on How does clustering (especially String clustering) work?, informed me that Levenshtein distance is good to be used as a distance function for strings. Also, since I do not know in advance the number of clusters, hierarchical clustering is the way to go and not k-means. Although I get the problem in its abstract form, I do not know what is the easie way to actually do