cluster-analysis

Access scores for observation on linear discriminants in LDA using MASS:lda()

六月ゝ 毕业季﹏ 提交于 2019-12-21 21:43:06
问题 library(MASS) example(lda) plot(z) How can I access all the points in z? I want to know the values of every point along LD1 and LD2 depending on their Sp (c,s,v). 回答1: What you are looking for is computed as part of the predict() method of objects of class "lda" (see ?predict.lda ). It is returned as component x of the object produced by predict(z) : ## follow example from ?lda Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c("s","c","v"), rep(50,3))) set.seed(1) ##

how to use different distance formula other than euclidean distance in k means

主宰稳场 提交于 2019-12-21 20:42:28
问题 I am working with latitude longitude data. I have to make clusters based on distance between two points. Now distance between two different point is =ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371 I want to use k means in R. Is there any way I can override distance calculation in that process? 回答1: K-means is not distance based It is based on variance minimization . The sum-of-variance formula equals the sum of squared Euclidean distances , but the converse, for other

Systematic threshold for cosine similarity with TF-IDF weights

ぃ、小莉子 提交于 2019-12-21 17:18:17
问题 I am running an analysis of several thousand (e.g., 10,000) text documents. I have computed TF-IDF weights and have a matrix with pairwise cosine similarities. I want to treat the documents as a graph to analyze various properties (e.g., the path length separating groups of documents) and to visualize the connections as a network. The problem is that there are too many similarities. Most are too small to be meaningful. I see many people dealing with this problem by dropping all similarities

Most mutually distant k elements (clustering?)

[亡魂溺海] 提交于 2019-12-21 13:48:12
问题 I have a simple machine learning question: I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to Maximize: Choose 10 different elements. Return min distance over (all pairings within the 10). My distance metric is symmetric and respects the triangle inequality. What kind of algorithm can I use? My first instinct is to do the following: Cluster the n elements into 20 clusters. Replace each

Scipy.cluster.hierarchy.fclusterdata + distance measure

て烟熏妆下的殇ゞ 提交于 2019-12-21 13:06:23
问题 1) I am using scipy's hcluster module. so the variable that I have control over is the threshold variable. How do I know my performance per threshold? i.e. In Kmeans, this performance will be the sum of all the points to their centroids. Of course, this has to be adjusted since more clusters = less distance generally. Is there an observation that I can do with hcluster for this? 2) I am realize there are tons of metrics available for fclusterdata. I am clustering of text documents based on tf

spectral clustering

回眸只為那壹抹淺笑 提交于 2019-12-21 11:23:13
问题 First off I must say that I'm new to matlab (and to this site...) , so please excuse my ignorance. I'm trying to write a function in matlab that will use Spectral Clustering to split a set of points into two clusters. my code is as follows function Groups = TrySpectralClustering(data) dist_mat = squareform(pdist(data)); W= zeros(length(data),length(data)); for i=1:length(data), for j=(i+1):length(data), W(i,j)=10^(-dist_mat(i,j)); W(j,i)=W(i,j); end end D = zeros(length(data),length(data));

spectral clustering

主宰稳场 提交于 2019-12-21 11:23:10
问题 First off I must say that I'm new to matlab (and to this site...) , so please excuse my ignorance. I'm trying to write a function in matlab that will use Spectral Clustering to split a set of points into two clusters. my code is as follows function Groups = TrySpectralClustering(data) dist_mat = squareform(pdist(data)); W= zeros(length(data),length(data)); for i=1:length(data), for j=(i+1):length(data), W(i,j)=10^(-dist_mat(i,j)); W(j,i)=W(i,j); end end D = zeros(length(data),length(data));

spectral clustering

半腔热情 提交于 2019-12-21 11:23:04
问题 First off I must say that I'm new to matlab (and to this site...) , so please excuse my ignorance. I'm trying to write a function in matlab that will use Spectral Clustering to split a set of points into two clusters. my code is as follows function Groups = TrySpectralClustering(data) dist_mat = squareform(pdist(data)); W= zeros(length(data),length(data)); for i=1:length(data), for j=(i+1):length(data), W(i,j)=10^(-dist_mat(i,j)); W(j,i)=W(i,j); end end D = zeros(length(data),length(data));

k-means clustering implementation in Javascript?

喜欢而已 提交于 2019-12-21 09:17:28
问题 I'm in need for a Javascript implementation of the k-means clustering algorithm. I only have 1-dimensional data and rarely more than 100 items, so performance is not an issue. PS: I could only find one but it seems extremely unsteady, resulting in completely different clusters on virtually every call. 回答1: k-means in Javascript: http://code.google.com/p/hdict/source/browse/gae/files/kmeans.js http://www.mymessedupmind.co.uk/index.php/javascript-k-mean-algorithm Applet: http://www.math.le.ac

In scikit-learn, can DBSCAN use sparse matrix?

谁都会走 提交于 2019-12-21 09:07:44
问题 I got Memory Error when I was running dbscan algorithm of scikit. My data is about 20000*10000, it's a binary matrix. (Maybe it's not suitable to use DBSCAN with such a matrix. I'm a beginner of machine learning. I just want to find a cluster method which don't need an initial cluster number) Anyway I found sparse matrix and feature extraction of scikit. http://scikit-learn.org/dev/modules/feature_extraction.html http://docs.scipy.org/doc/scipy/reference/sparse.html But I still have no idea