cluster-analysis | 易学教程

Access scores for observation on linear discriminants in LDA using MASS:lda()

阅读更多关于 Access scores for observation on linear discriminants in LDA using MASS:lda()

问题 library(MASS) example(lda) plot(z) How can I access all the points in z? I want to know the values of every point along LD1 and LD2 depending on their Sp (c,s,v). 回答1: What you are looking for is computed as part of the predict() method of objects of class "lda" (see ?predict.lda ). It is returned as component x of the object produced by predict(z) : ## follow example from ?lda Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c("s","c","v"), rep(50,3))) set.seed(1) ##

how to use different distance formula other than euclidean distance in k means

阅读更多关于 how to use different distance formula other than euclidean distance in k means

问题 I am working with latitude longitude data. I have to make clusters based on distance between two points. Now distance between two different point is =ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371 I want to use k means in R. Is there any way I can override distance calculation in that process? 回答1: K-means is not distance based It is based on variance minimization . The sum-of-variance formula equals the sum of squared Euclidean distances , but the converse, for other

Systematic threshold for cosine similarity with TF-IDF weights

阅读更多关于 Systematic threshold for cosine similarity with TF-IDF weights

问题 I am running an analysis of several thousand (e.g., 10,000) text documents. I have computed TF-IDF weights and have a matrix with pairwise cosine similarities. I want to treat the documents as a graph to analyze various properties (e.g., the path length separating groups of documents) and to visualize the connections as a network. The problem is that there are too many similarities. Most are too small to be meaningful. I see many people dealing with this problem by dropping all similarities

Most mutually distant k elements (clustering?)

阅读更多关于 Most mutually distant k elements (clustering?)

问题 I have a simple machine learning question: I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to Maximize: Choose 10 different elements. Return min distance over (all pairings within the 10). My distance metric is symmetric and respects the triangle inequality. What kind of algorithm can I use? My first instinct is to do the following: Cluster the n elements into 20 clusters. Replace each

Scipy.cluster.hierarchy.fclusterdata + distance measure

阅读更多关于 Scipy.cluster.hierarchy.fclusterdata + distance measure

问题 1) I am using scipy's hcluster module. so the variable that I have control over is the threshold variable. How do I know my performance per threshold? i.e. In Kmeans, this performance will be the sum of all the points to their centroids. Of course, this has to be adjusted since more clusters = less distance generally. Is there an observation that I can do with hcluster for this? 2) I am realize there are tons of metrics available for fclusterdata. I am clustering of text documents based on tf

spectral clustering

阅读更多关于 spectral clustering

问题 First off I must say that I'm new to matlab (and to this site...) , so please excuse my ignorance. I'm trying to write a function in matlab that will use Spectral Clustering to split a set of points into two clusters. my code is as follows function Groups = TrySpectralClustering(data) dist_mat = squareform(pdist(data)); W= zeros(length(data),length(data)); for i=1:length(data), for j=(i+1):length(data), W(i,j)=10^(-dist_mat(i,j)); W(j,i)=W(i,j); end end D = zeros(length(data),length(data));

spectral clustering

阅读更多关于 spectral clustering

spectral clustering

阅读更多关于 spectral clustering

k-means clustering implementation in Javascript?

阅读更多关于 k-means clustering implementation in Javascript?

问题 I'm in need for a Javascript implementation of the k-means clustering algorithm. I only have 1-dimensional data and rarely more than 100 items, so performance is not an issue. PS: I could only find one but it seems extremely unsteady, resulting in completely different clusters on virtually every call. 回答1: k-means in Javascript: http://code.google.com/p/hdict/source/browse/gae/files/kmeans.js http://www.mymessedupmind.co.uk/index.php/javascript-k-mean-algorithm Applet: http://www.math.le.ac

In scikit-learn, can DBSCAN use sparse matrix?

阅读更多关于 In scikit-learn, can DBSCAN use sparse matrix?

问题 I got Memory Error when I was running dbscan algorithm of scikit. My data is about 20000*10000, it's a binary matrix. (Maybe it's not suitable to use DBSCAN with such a matrix. I'm a beginner of machine learning. I just want to find a cluster method which don't need an initial cluster number) Anyway I found sparse matrix and feature extraction of scikit. http://scikit-learn.org/dev/modules/feature_extraction.html http://docs.scipy.org/doc/scipy/reference/sparse.html But I still have no idea