cluster-analysis

ELKI: Running DBSCAN on custom Objects in Java

瘦欲@ 提交于 2019-12-23 10:27:04
问题 I'm trying to use ELKI from within JAVA to run DBSCAN. For testing I used a FileBasedDatabaseConnection. Now I would like to run DBSCAN with my custom Objects as parameters. My objects have the following structure: public class MyObject { private Long id; private Float param1; private Float param2; // ... and more parameters as well as getters and setters } I'd like to run DBSCAN within ELKI using a List<MyObject> as database, but only some of the parameters should be taken into account (e.g.

sklearn.mixture.DPGMM: Unexpected results

拈花ヽ惹草 提交于 2019-12-23 09:40:53
问题 The results I get from DPGMM are not what I expect. E.g.: >>> import sklearn.mixture >>> sklearn.__version__ '0.12-git' >>> data = [[1.1],[0.9],[1.0],[1.2],[1.0], [6.0],[6.1],[6.1]] >>> m = sklearn.mixture.DPGMM(n_components=5, n_iter=1000, alpha=1) >>> m.fit(data) DPGMM(alpha=1, covariance_type='diag', init_params='wmc', min_covar=None, n_components=5, n_iter=1000, params='wmc', random_state=<mtrand.RandomState object at 0x108a3f168>, thresh=0.01, verbose=False) >>> m.converged_ True >>> m

silhouette coefficient in python with sklearn

不羁的心 提交于 2019-12-23 08:47:53
问题 I'm having trouble computing the silhouette coefficient in python with sklearn. Here is my code : from sklearn import datasets from sklearn.metrics import * iris = datasets.load_iris() X = pd.DataFrame(iris.data, columns = col) y = pd.DataFrame(iris.target,columns = ['cluster']) s = silhouette_score(X, y, metric='euclidean',sample_size=int(50)) I get the error : IndexError: indices are out-of-bounds I want to use the sample_size parameter because when working with very large datasets,

Affinity Propagation (sklearn) - strange behavior

限于喜欢 提交于 2019-12-23 08:04:33
问题 Trying to use affinity propagation for a simple clustering task: from sklearn.cluster import AffinityPropagation c = [[0], [0], [0], [0], [0], [0], [0], [0]] af = AffinityPropagation (affinity = 'euclidean').fit (c) print (af.labels_) I get this strange result: [0 1 0 1 2 1 1 0] I would expect to have all samples in the same cluster, like in this case: c = [[0], [0], [0]] af = AffinityPropagation (affinity = 'euclidean').fit (c) print (af.labels_) which indeed puts all samples in the same

Affinity Propagation (sklearn) - strange behavior

十年热恋 提交于 2019-12-23 08:04:06
问题 Trying to use affinity propagation for a simple clustering task: from sklearn.cluster import AffinityPropagation c = [[0], [0], [0], [0], [0], [0], [0], [0]] af = AffinityPropagation (affinity = 'euclidean').fit (c) print (af.labels_) I get this strange result: [0 1 0 1 2 1 1 0] I would expect to have all samples in the same cluster, like in this case: c = [[0], [0], [0]] af = AffinityPropagation (affinity = 'euclidean').fit (c) print (af.labels_) which indeed puts all samples in the same

Nearest point between two clusters Matlab

雨燕双飞 提交于 2019-12-23 06:06:38
问题 I have a set of clusters consisting of 3D points. I want to get the nearest two points from each two clusters. For example: I have 5 clusters C1 to C5 consisting of a 3D points. For C1 and C2 there are two points Pc1 "point in C1" and Pc2 "point in C2" that are the closet two points between the two clusters C1 and C2, same between C1 and C3..C5 and same between C2 and C3..C5 and so on. After that I'll have 20 points representing the nearest points between the different clusters. The second

Optimizing K-means clustering using Genetic Algorithm

六眼飞鱼酱① 提交于 2019-12-23 05:40:29
问题 I have the following dataset (obtained here): ----------item survivalpoints weight 1 pocketknife 10 1 2 beans 20 5 3 potatoes 15 10 4 unions 2 1 5 sleeping bag 30 7 6 rope 10 5 7 compass 30 1 I can cluster this dataset into three clusters with kmeans() using a binary string as my initial choice of centers. For eg: ## 1 represents the initial centers chromosome = c(1,1,1,0,0,0,0) ## exclude first column (kmeans only support continous data) cl <- kmeans(dataset[, -1], dataset[chromosome == 1,

r: error for NbClust() call when deploying it within for() loop - “Error in if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + :”

徘徊边缘 提交于 2019-12-23 05:27:36
问题 I want to call the NbClust() function for a couple of dataframes. I do so by "sending" them all through a for loop that contains the NbClust() function call. The code looks like this: #combos of just all columns from df variations = unlist(lapply(seq_along(df), function(x) combn(df, x, simplify=FALSE)), recursive=FALSE) for(i in 1:length(variations)){ df = data.frame(variations[i]) nc = NbClust(scale(df), distance="euclidean", min.nc=2, max.nc=10, method="complete") } Unfortunately it always

Clustering algorithm for unweighted graphs

回眸只為那壹抹淺笑 提交于 2019-12-23 04:12:08
问题 I am having unweighted and undirected graph as my network which is basically the network of proteins.I want to cluster this graph and divide this graph in to disjoint clusters. Can any 1 suggest clustering algorithms which i can apply on the biological network which is unweighted and undirected graph. 回答1: Several graph partitioning algorithms exist, they use different paradigm to tackle the same problem. The most common is the Louvain's method optimizing Newman's modularity. In python using

R: Cluster analysis with hclust(). How to get the cluster representatives?

纵然是瞬间 提交于 2019-12-23 03:40:06
问题 I am doing some cluster analysis with R . I am using the hclust() function and I would like to get, after I perform the cluster analysis, the cluster representative of each cluster. I define a cluster representative as the instances which are closest to the centroid of the cluster. So the steps are: Finding the centroid of the clusters Finding the cluster representatives I have already asked a similar question but using K-means: https://stats.stackexchange.com/questions/251987/cluster