cluster-analysis

Clustering longitude and latitude gps data

你离开我真会死。 提交于 2019-12-06 09:04:16
I have more than 400 thousand cars GPS locations, like: [ 25.41452217, 37.94879532], [ 25.33231735, 37.93455887], [ 25.44327736, 37.96868896], ... I need to make spatial clustering with the distance between points <= 3 meters. I tried to use DBSCAN , but it seems that it is not working for geo(longitude, latitude) . Also, I do not know the number of clusters. You can use pairwise_distances to calculate Geo distance from latitude / longitude and then pass the distance matrix into DBSCAN, by specifying metric='precomputed'. To calculate the distance matrix: from sklearn.metrics.pairwise import

hierarchical cluster labeling with plots

喜你入骨 提交于 2019-12-06 08:37:38
I have a distance matrix for ~20 elements, which I am using to do hierarchical clustering in R. Is there a way to label elements with a plot or a picture instead of just numbers, characters, etc? So, instead of the leaf nodes having numbers, it'd have small plots or pictures. Here is why I'm interested in this functionality. I have 2-D scatterplots like these (color indicates density) http://www.pnas.org/content/108/51/20455/F2.large.jpg (Note that this is not my own data) I have to analyze hundreds of such 2-D scatter plots, and am trying out various distance metrics which I'm feeding on to

I have 2,000,000 points in 100 dimensionality space. How can I cluster them to K (e.g., 1000) clusters?

旧街凉风 提交于 2019-12-06 08:15:59
问题 The problem comes as follows. I have M images and extract N features for each image, and the dimensionality of each feature is L. Thus, I have M*N features (2,000,000 for my case) and each feature has L dimensionality (100 for my case). I need to cluster these M*N features into K clusters. How can I do it? Thanks. 回答1: Do you want 1000 clusters of images, or of features, or of (image, feature) pairs ? In any case, it sounds as though you'll have to reduce the data and use simpler methods. One

Find connected components in a graph in MATLAB

大城市里の小女人 提交于 2019-12-06 07:18:22
I have many 3D data points, and I wish to find 'connected components' in this graph. This is where clusters are formed that exhibit the following properties: Each cluster contains points all of which are at most distance from another point in the cluster. All points in two distinct clusters are at least distance from each other. This problem is described in the question and answer here . Is there a MATLAB implementation of such an algorithm built-in or available on the FEX? Simple searches have not thrown up anything useful. Amro Perhaps a density-based clustering algorithm can be applied in

Show rows on clustered kmeans data

烂漫一生 提交于 2019-12-06 07:14:59
Hi I was wondering when you cluster data on the figure screen is there a way to show which rows the data points belong to when you scroll over them? From the picture above I was hoping there would be a way in which if I select or scroll over the points that I could tell which row it belonged to. Here is the code: %% dimensionality reduction columns = 6 [U,S,V]=svds(fulldata,columns); %% randomly select dataset rows = 1000; columns = 6; %# pick random rows indX = randperm( size(fulldata,1) ); indX = indX(1:rows); %# pick random columns indY = randperm( size(fulldata,2) ); indY = indY(1:columns)

Algorithm for clustering people with similar interests

六眼飞鱼酱① 提交于 2019-12-06 06:50:21
问题 I want to cluster people into groups based on their interests. For eg. people who like machine learning and graphs may be placed in a group and people who have interest in mathematics and economics etc. may be placed in a different group. The algorithm should be able to decide which people have most matching interests based on the interests of the people and create clusters.It should also be able to output about other persons in the group in which a particular person is placed. 回答1: This does

R - cluster analysis on binary weblog data

风流意气都作罢 提交于 2019-12-06 05:09:16
I have a web data that looks similar to the sample below. It simply has the user and binary value for whether that user cliked on a particular link within a website. I wanted to do some clustering of this data. My main goal is to find similar users based on their online behaviour. What is a good clustering alorithm for this? I have tried k-means which does not work well with binary data. I have also tried spherical k-means skmeans() . I wanted to do a sum of squared error scree plot, but I could not figure out how to get SSE from skmeans. User link1 link2 link3 link4 abc1 0 1 1 1 abc2 1 0 1 0

Visualize data and clustering [closed]

前提是你 提交于 2019-12-06 04:47:07
Closed. This question is off-topic . It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . i am currently writing a python script to find the similarity between documents.I have already calculated the similarities score for each document pairs and store them in dictionaries. It looks something like this: {(8328, 8327): 1.0, (8313, 8306): 0.12405229825691289, (8329, 8328): 1.0, (8322, 8321): 0.99999999999999989, (8328, 8329): 1.0, (8306, 8316): 0.12405229825691289, (8320, 8319): 0.67999999999999989,

Plot multi-dimension cluster to 2D plot python

时光毁灭记忆、已成空白 提交于 2019-12-06 04:46:45
I was working on clustering a lot of data, which has two different clusters. The first type is a 6-dimensional cluster whereas the second type is a 12-dimensional cluster. For now I have decided to use kmeans (as it seems the most intuitive clustering algorithm for the start). The question is how can I map these clusters on a 2d plot so that I can infer whether kmeans is working or not. I would like to use matplotlib, but any other python package is fine. Cluster 1 is a cluster made up of these data types (int,float,float,int,float,int) Cluster 2 is a cluster made up of 12 float types. Trying