cluster-analysis

Parameter estimation in DBSCAN

99封情书 提交于 2020-06-10 03:42:18
问题 I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means clustering but of less help, it didn't work well, there was a lot of overlap over the classes that I was looking for (probably because of non-globular shape of classes and random initialisation in k-means). I am now working on using DBSCAN, but I have trouble understanding the epsilon value and mini-points value in

Parameter estimation in DBSCAN

安稳与你 提交于 2020-06-10 03:41:59
问题 I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means clustering but of less help, it didn't work well, there was a lot of overlap over the classes that I was looking for (probably because of non-globular shape of classes and random initialisation in k-means). I am now working on using DBSCAN, but I have trouble understanding the epsilon value and mini-points value in

plot data structure as a tree in R

萝らか妹 提交于 2020-05-27 06:21:29
问题 I'm using sizetree() function from plotrix package to draw my data structure as a tree ( see below ) and it works just fine. However, I was wondering if there might be another way (or a package) that would provide a more elegant tree plot of the same data with the same information displayed? ( Note: In the below plot, fonts are unnecessarily either too big or too small so are the rectangles etc. also may be the plot could be inverted to get a better look.)-- it's subjective but I appreciate

Is K-means for clustering data with many zero values?

落花浮王杯 提交于 2020-05-15 18:38:49
问题 I need to cluster a matrix which contains mostly zeros values...Is K-means appropriate for these kind of data or do I need to consider a different algorithm? 回答1: No. The reason is that the mean is not sensible on sparse data. The resulting mean vectors will have very different characteristics than your actual data; they will often end up being more similar to each other than to actual documents! There are some modifications that improve k-means for sparse data such as spherical k-means. But

Is K-means for clustering data with many zero values?

白昼怎懂夜的黑 提交于 2020-05-15 18:38:10
问题 I need to cluster a matrix which contains mostly zeros values...Is K-means appropriate for these kind of data or do I need to consider a different algorithm? 回答1: No. The reason is that the mean is not sensible on sparse data. The resulting mean vectors will have very different characteristics than your actual data; they will often end up being more similar to each other than to actual documents! There are some modifications that improve k-means for sparse data such as spherical k-means. But

How can GridSearchCV be used for clustering (MeanShift or DBSCAN)?

时光毁灭记忆、已成空白 提交于 2020-05-15 04:23:08
问题 I'm trying to cluster some text documents using scikit-learn . I'm trying out both DBSCAN and MeanShift and want to determine which hyperparameters (e.g. bandwidth for MeanShift and eps for DBSCAN) best work for the kind of data I'm using (news articles). I have some testing data which consists of pre-labeled clusters. I have been trying to use scikit-learn 's GridSearchCV but don't understand how (or if it can) be applied in this case, since it needs the test data to be split, but I want to

graphics window not working properly in `kml` package

不羁的心 提交于 2020-05-14 09:17:13
问题 I started working with the package kml to perform longitudinal cluster analysis. The package claims to have an interactive graphics window that lets you explore the clusterings found by kml . The window can be opened (according to the docs) by calling the function choice . But: That window does not open. Instead I get an error: Error in setGraphicsEventEnv(which, as.environment(list(...))) : this graphics device does not support event handling From the docs ?choice : At first, choice opens a

graphics window not working properly in `kml` package

故事扮演 提交于 2020-05-14 09:15:44
问题 I started working with the package kml to perform longitudinal cluster analysis. The package claims to have an interactive graphics window that lets you explore the clusterings found by kml . The window can be opened (according to the docs) by calling the function choice . But: That window does not open. Instead I get an error: Error in setGraphicsEventEnv(which, as.environment(list(...))) : this graphics device does not support event handling From the docs ?choice : At first, choice opens a

How can we show the trajectories belonging to clusters in `kml` package?

孤街醉人 提交于 2020-05-13 22:55:10
问题 The kml package implements k-means for longitudinal data. The clustering works just fine. Now I'm wondering how I can show the 'structure' of the clusters, for example, by coloring them. A most simple example from the docs (help file of the clusterLongData function..): library(kml) traj <- matrix(c(1,2,3,1,4, 3,6,1,8,10, 1,2,1,3,2, 4,2,5,6,3, 4,3,4,4,4, 7,6,5,5,4),6) myCld <- clusterLongData( traj=traj, idAll=as.character(c(100,102,103,109,115,123)), time=c(1,2,4,8,15), varNames="P", maxNA=3

Fast (< n^2) clustering algorithm

自闭症网瘾萝莉.ら 提交于 2020-05-09 17:47:25
问题 I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be bounding spheres with a specified radius). That means that there probably has to be many clusters of size 1. But! I need the running time to be well below n^2. n log n or so should be fine. The reason I'm doing this clustering is to avoid computing a distance matrix of all n points (which takes n^2 time or many hours),