cluster-analysis

How to Bound the Outer Area of Voronoi Polygons and Intersect with Map Data

。_饼干妹妹 提交于 2019-12-05 12:16:08
问题 Background I'm trying to visualize the results of a kmeans clustering procedure on the following data using voronoi polygons on a US map. Here is the code I've been running so far: input <- read.csv("LatLong.csv", header = T, sep = ",") # K Means Clustering set.seed(123) km <- kmeans(input, 17) cent <- data.frame(km$centers) # Visualization states <- map_data("state") StateMap <- ggplot() + geom_polygon(data = states, aes(x = long, y = lat, group = group), col = "white") # Voronoi V <- deldir

Correlating word proximity

跟風遠走 提交于 2019-12-05 11:27:28
Let's say I have a text transcript of a dialogue over a period of aprox. 1 hour. I want to know what words happen in close proximatey to one another. What type of statistical technique would I use to determine what words are clustered together and how close their proximatey to one another is? I'm suspecting some sort of cluster analysis or PCA. jayunit100 To determine word proximity, you will have to build a graph: each word is a vertex (or "node"), and left and right words are edges So "I like dogs" would have 2 edges and 3 vertices. Now, the next step will be to decide based on this model

In R, is there an algorithm to create approximately equal sized clusters

心不动则不痛 提交于 2019-12-05 11:01:38
There seems to be a lot of information about creating either hierarchical or k-means clusters. But I would like to know if there is an solution in R that would create K clusters of approximately equal sizes. There is some stuff out there about doing this in other languages, but I have not been able to find anything from searching on the internet that suggests how to achieve the result in R. An example would be set.seed(123) df <- matrix(rnorm(100*5), nrow=100) km <- kmeans(df, 10) print(sapply(1:10, function(n) sum(km$cluster==n))) which results in [1] 14 12 4 13 16 6 8 7 13 7 I would ideally

Best programming language to implement DBSCAN algorithm querying a MongoDB database?

ぃ、小莉子 提交于 2019-12-05 10:59:36
问题 I've to implement the DBSCAN algorithm. Assuming to start from this pseudocode DBSCAN(D, eps, MinPts) C = 0 for each unvisited point P in dataset D mark P as visited NeighborPts = regionQuery(P, eps) if sizeof(NeighborPts) < MinPts mark P as NOISE else C = next cluster expandCluster(P, NeighborPts, C, eps, MinPts) expandCluster(P, NeighborPts, C, eps, MinPts) add P to cluster C for each point P' in NeighborPts if P' is not visited mark P' as visited NeighborPts' = regionQuery(P', eps) if

Implementation of k-means clustering algorithm

独自空忆成欢 提交于 2019-12-05 10:11:44
问题 In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop. can anyone please guide me where i'm making a mistake..? for simplicity, i hav taken the input in the program code itself. here is my code : import java.io.*; import java.lang.*; class Kmean { public static void main(String args[]) { int N=9; int arr[]={2,4,10,12,3,20,30,11,25};

Generate random points distributed like cities?

こ雲淡風輕ζ 提交于 2019-12-05 10:09:10
How can one generate say 1000 random points with a distribution like that of towns and cities in e.g. Ohio ? I'm afraid I can't define "distributed like cities" precisely; uniformly distributed centres + small Gaussian clouds are easy but ad hoc. Added: There must be a family of 2d distributions with a clustering parameter that can be varied to match a given set of points ? Maybe you can take a look at Walter Christaller's Theory of Central Places . I guess there must be some generator somewhere, or you can cook up your own. Start with a model of the water features in your target area (or make

Clustering with scipy - clusters via distance matrix, how to get back the original objects

為{幸葍}努か 提交于 2019-12-05 06:31:29
I can't seam to find any simple enough tutorials or descriptions on clustering in scipy, so I'll try to explain my problem: I try to cluster documents (hierarchical agglomerative clustering) , and have created a vector for each document and produced a symmetric distance matrix. The vector_list contains (really long) vectors representing each document. The order of this list of vectors is the same as my list of input documents so that I'll (hopefully) be able to match the results of the clustering with the corresponding document. distances = distance.cdist(vector_list, vector_list, 'euclidean')

How to know about group information in cluster analysis (hierarchical)?

南笙酒味 提交于 2019-12-05 05:47:50
问题 I have problem about group in cluster analysis(hierarchical cluster). As example, this is the dendrogram of complete linkage of Iris data set . After I use > table(cutree(hc, 3), iris$Species) This is the output : setosa versicolor virginica 1 50 0 0 2 0 23 49 3 0 27 1 I have read in one statistical website that, object 1 in the data always belongs to group/cluster 1. From the output above, we know that setosa is in group 1 . Then, how I am going to know about the other two species. How do

K means finding elbow when the elbow plot is a smooth curve

て烟熏妆下的殇ゞ 提交于 2019-12-05 05:35:33
I am trying to plot the elbow of k means using the below code: load CSDmat %mydata for k = 2:20 opts = statset('MaxIter', 500, 'Display', 'off'); [IDX1,C1,sumd1,D1] = kmeans(CSDmat,k,'Replicates',5,'options',opts,'distance','correlation');% kmeans matlab [yy,ii] = min(D1'); %% assign points to nearest center distort = 0; distort_across = 0; clear clusts; for nn=1:k I = find(ii==nn); %% indices of points in cluster nn J = find(ii~=nn); %% indices of points not in cluster nn clusts{nn} = I; %% save into clusts cell array if (length(I)>0) mu(nn,:) = mean(CSDmat(I,:)); %% update mean %% Compute

OpenCV K-Means (kmeans2)

我怕爱的太早我们不能终老 提交于 2019-12-05 04:58:17
I'm using Opencv's K-means implementation to cluster a large set of 8-dimensional vectors. They cluster fine, but I can't find any way to see the prototypes created by the clustering process. Is this even possible? OpenCV only seems to give access to the cluster indexes (or labels). If not I guess it'll be time to make my own implementation! I can't say I used OpenCV's implementation of Kmeans, but if you have access to the labels given to each instance, you can simply get the centroids by calculating the average vector of instances belong to each of the clusters. As of (at least) OpenCV 2.0,