cluster-analysis

Join neighbour cluster centroids Matlab

二次信任 提交于 2019-12-10 12:15:51
问题 I have used K-means to cluster data into 8 different clusters using this [X,C] = kmeans(XX, 8] , this means I have 8 centroids where their locations is stored in C "example shown below X Y Z as coloumns". I want to connect the 8 centroids together where only the centroids of the clusters that are close to each other are connected "have borders between each other" while centroids of clusters that are not close to each other are not connected. So if anyone could please advise? C= -0

How do I get perplexity and log likelihood in Spark LDA? [closed]

試著忘記壹切 提交于 2019-12-10 12:01:56
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I'm trying to get perplexity and log likelihood of a Spark LDA model (with Spark 2.1). The code below does not work (methods logLikelihood and logPerplexity not found) although I can save the model. from pyspark.mllib.clustering import LDA from pyspark.mllib.linalg import Vectors # construct corpus # run LDA

Optimal grouping/clustering of items in groups with minimum size

孤者浪人 提交于 2019-12-10 11:52:15
问题 I am looking for an algorithm that solves the following problem: Given: a set of items and their similarity matrix. Goal: group these items in "clusters" of minimum size m Conditions: There are no cluster-like structures in the dataset, as shown in Figure 1 Anyway, the items in a group should be similar to each other. Thus, the global similarity would be high. The motivation is not to identify good clusters but to split a dataset into groups of high similarity and of minimum size.

How to get the largest possible column sequence with the least possible row NAs from a huge matrix?

北城余情 提交于 2019-12-10 11:27:22
问题 I want to select columns from a data frame so that the resulting continuous column-sequences are as long as possible, while the number of rows with NAs is as small as possible, because they have to be dropped afterwards. (The reason I want to do this is, that I want to run TraMineR::seqsubm() to automatically get a matrix of transition costs (by transition probability) and later run cluster::agnes() on it. TraMineR::seqsubm() doesn't like NA states and cluster::agnes() with NA states in the

How would I cluster an unordered list of locations? [duplicate]

怎甘沉沦 提交于 2019-12-10 11:15:18
问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: Clustering Algorithm for Mapping Application I have an unordered List of locations (containing their co-ordinates). I know to use the Haversine formula to calculate the distance between two points. But solutions for clustering I've looked at say I'd need to order the list first. What is the correct ordering for locations? I want to cluster (i.e. put all locations into a single clusteredLocation object) all

How can I use the index-structures in ELKI?

痴心易碎 提交于 2019-12-10 10:28:15
问题 These are quotes form http://elki.dbs.ifi.lmu.de/ : "Essentially, we bind the abstract distance query to a database, and then get a nearest neighbor search for this distance. At this point, ELKI will automatically choose the most appropriate kNN query class. If there exist an appropriate index for our distance function (not every index can accelerate every distance!), it will automatically be used here." "The getKNNForDBID method may boil down to a slow linear scan, but when the database has

Clustering with scipy - clusters via distance matrix, how to get back the original objects

╄→尐↘猪︶ㄣ 提交于 2019-12-10 04:24:42
问题 I can't seam to find any simple enough tutorials or descriptions on clustering in scipy, so I'll try to explain my problem: I try to cluster documents (hierarchical agglomerative clustering) , and have created a vector for each document and produced a symmetric distance matrix. The vector_list contains (really long) vectors representing each document. The order of this list of vectors is the same as my list of input documents so that I'll (hopefully) be able to match the results of the

K means finding elbow when the elbow plot is a smooth curve

末鹿安然 提交于 2019-12-10 04:13:23
问题 I am trying to plot the elbow of k means using the below code: load CSDmat %mydata for k = 2:20 opts = statset('MaxIter', 500, 'Display', 'off'); [IDX1,C1,sumd1,D1] = kmeans(CSDmat,k,'Replicates',5,'options',opts,'distance','correlation');% kmeans matlab [yy,ii] = min(D1'); %% assign points to nearest center distort = 0; distort_across = 0; clear clusts; for nn=1:k I = find(ii==nn); %% indices of points in cluster nn J = find(ii~=nn); %% indices of points not in cluster nn clusts{nn} = I; %%

OpenCV K-Means (kmeans2)

夙愿已清 提交于 2019-12-10 03:29:01
问题 I'm using Opencv's K-means implementation to cluster a large set of 8-dimensional vectors. They cluster fine, but I can't find any way to see the prototypes created by the clustering process. Is this even possible? OpenCV only seems to give access to the cluster indexes (or labels). If not I guess it'll be time to make my own implementation! 回答1: I can't say I used OpenCV's implementation of Kmeans, but if you have access to the labels given to each instance, you can simply get the centroids

individual random effects model with standard errors clustered on a different variable in R (R-project)

大兔子大兔子 提交于 2019-12-09 19:06:26
问题 I'm currently working on some data from an experiment. Thus, I have data about some individuals who are randomly assigned to 2 different treatments. For each treatment, we ran three sessions. In each session, participants were asked to make a sequence of decisions. What I would like to do is to: (1) estimate the effect of the treatment with a model that includes random effects on individuals and afterwards, (2) clustering the standard errors by session. In R, I can easily estimate the random