cluster-analysis | 易学教程

ELKI: Running DBSCAN on custom Objects in Java

阅读更多关于 ELKI: Running DBSCAN on custom Objects in Java

问题 I'm trying to use ELKI from within JAVA to run DBSCAN. For testing I used a FileBasedDatabaseConnection. Now I would like to run DBSCAN with my custom Objects as parameters. My objects have the following structure: public class MyObject { private Long id; private Float param1; private Float param2; // ... and more parameters as well as getters and setters } I'd like to run DBSCAN within ELKI using a List<MyObject> as database, but only some of the parameters should be taken into account (e.g.

sklearn.mixture.DPGMM: Unexpected results

阅读更多关于 sklearn.mixture.DPGMM: Unexpected results

问题 The results I get from DPGMM are not what I expect. E.g.: >>> import sklearn.mixture >>> sklearn.__version__ '0.12-git' >>> data = [[1.1],[0.9],[1.0],[1.2],[1.0], [6.0],[6.1],[6.1]] >>> m = sklearn.mixture.DPGMM(n_components=5, n_iter=1000, alpha=1) >>> m.fit(data) DPGMM(alpha=1, covariance_type='diag', init_params='wmc', min_covar=None, n_components=5, n_iter=1000, params='wmc', random_state=<mtrand.RandomState object at 0x108a3f168>, thresh=0.01, verbose=False) >>> m.converged_ True >>> m

silhouette coefficient in python with sklearn

阅读更多关于 silhouette coefficient in python with sklearn

问题 I'm having trouble computing the silhouette coefficient in python with sklearn. Here is my code : from sklearn import datasets from sklearn.metrics import * iris = datasets.load_iris() X = pd.DataFrame(iris.data, columns = col) y = pd.DataFrame(iris.target,columns = ['cluster']) s = silhouette_score(X, y, metric='euclidean',sample_size=int(50)) I get the error : IndexError: indices are out-of-bounds I want to use the sample_size parameter because when working with very large datasets,

Affinity Propagation (sklearn) - strange behavior

阅读更多关于 Affinity Propagation (sklearn) - strange behavior

问题 Trying to use affinity propagation for a simple clustering task: from sklearn.cluster import AffinityPropagation c = [[0], [0], [0], [0], [0], [0], [0], [0]] af = AffinityPropagation (affinity = 'euclidean').fit (c) print (af.labels_) I get this strange result: [0 1 0 1 2 1 1 0] I would expect to have all samples in the same cluster, like in this case: c = [[0], [0], [0]] af = AffinityPropagation (affinity = 'euclidean').fit (c) print (af.labels_) which indeed puts all samples in the same

Affinity Propagation (sklearn) - strange behavior

阅读更多关于 Affinity Propagation (sklearn) - strange behavior

Nearest point between two clusters Matlab

阅读更多关于 Nearest point between two clusters Matlab

问题 I have a set of clusters consisting of 3D points. I want to get the nearest two points from each two clusters. For example: I have 5 clusters C1 to C5 consisting of a 3D points. For C1 and C2 there are two points Pc1 "point in C1" and Pc2 "point in C2" that are the closet two points between the two clusters C1 and C2, same between C1 and C3..C5 and same between C2 and C3..C5 and so on. After that I'll have 20 points representing the nearest points between the different clusters. The second

Optimizing K-means clustering using Genetic Algorithm

阅读更多关于 Optimizing K-means clustering using Genetic Algorithm

问题 I have the following dataset (obtained here): ----------item survivalpoints weight 1 pocketknife 10 1 2 beans 20 5 3 potatoes 15 10 4 unions 2 1 5 sleeping bag 30 7 6 rope 10 5 7 compass 30 1 I can cluster this dataset into three clusters with kmeans() using a binary string as my initial choice of centers. For eg: ## 1 represents the initial centers chromosome = c(1,1,1,0,0,0,0) ## exclude first column (kmeans only support continous data) cl <- kmeans(dataset[, -1], dataset[chromosome == 1,

r: error for NbClust() call when deploying it within for() loop - “Error in if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + :”

阅读更多关于 r: error for NbClust() call when deploying it within for() loop - “Error in if ((res[ncP - min_nc + 1, 15]

问题 I want to call the NbClust() function for a couple of dataframes. I do so by "sending" them all through a for loop that contains the NbClust() function call. The code looks like this: #combos of just all columns from df variations = unlist(lapply(seq_along(df), function(x) combn(df, x, simplify=FALSE)), recursive=FALSE) for(i in 1:length(variations)){ df = data.frame(variations[i]) nc = NbClust(scale(df), distance="euclidean", min.nc=2, max.nc=10, method="complete") } Unfortunately it always

Clustering algorithm for unweighted graphs

阅读更多关于 Clustering algorithm for unweighted graphs

问题 I am having unweighted and undirected graph as my network which is basically the network of proteins.I want to cluster this graph and divide this graph in to disjoint clusters. Can any 1 suggest clustering algorithms which i can apply on the biological network which is unweighted and undirected graph. 回答1: Several graph partitioning algorithms exist, they use different paradigm to tackle the same problem. The most common is the Louvain's method optimizing Newman's modularity. In python using

R: Cluster analysis with hclust(). How to get the cluster representatives?

阅读更多关于 R: Cluster analysis with hclust(). How to get the cluster representatives?

问题 I am doing some cluster analysis with R . I am using the hclust() function and I would like to get, after I perform the cluster analysis, the cluster representative of each cluster. I define a cluster representative as the instances which are closest to the centroid of the cluster. So the steps are: Finding the centroid of the clusters Finding the cluster representatives I have already asked a similar question but using K-means: https://stats.stackexchange.com/questions/251987/cluster