k-means

Will scikit-learn utilize GPU?

a 夏天 提交于 2019-12-02 21:48:21
Reading implementation of scikit-learn in tensroflow : http://learningtensorflow.com/lesson6/ and scikit-learn : http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html I'm struggling to decide which implementation to use. scikit-learn is installed as part of the tensorflow docker container so can use either implementation. Reason to use scikit-learn : scikit-learn contains less boiler plate than the tensorflow implementation. Reason to use tensorflow : If running on Nvidia GPU the algorithm wilk be run against in parallel , I'm not sure if scikit-learn will utilise all

Reveal k-modes cluster features

独自空忆成欢 提交于 2019-12-02 21:24:51
I'm performing a cluster analysis on categorical data, hence using k-modes approach. My data is shaped as a preference survey: How do you like hair and eyes? The respondent can pick up an answers from a fixed (multiple choice) set of 4 possibility. I therefore get the dummies, apply k-modes, attach the clusters back to the initial df and then plot them in 2D with pca. My code looks like: import numpy as np import pandas as pd from kmodes import kmodes df_dummy = pd.get_dummies(df) #transform into numpy array x = df_dummy.reset_index().values km = kmodes.KModes(n_clusters=3, init='Huang', n

Online k-means clustering

时光总嘲笑我的痴心妄想 提交于 2019-12-02 21:17:00
Is there a online version of the k-Means clustering algorithm? By online I mean that every data point is processed in serial, one at a time as they enter the system, hence saving computing time when used in real time. I have wrote one my self with good results, but I would really prefer to have something "standardized" to refer to, since it is to be used in my master thesis. Also, does anyone have advice for other online clustering algorithms? (lmgtfy failed ;)) Yes there is. Google failed to find it because it's more commonly known as "sequential k-means". You can find two pseudo-code

Can I use K-means algorithm on a string?

泪湿孤枕 提交于 2019-12-02 20:48:16
I am working on a python project where I study RNA structure evolution (represented as a string for example: "(((...)))" where the parenthesis represent basepairs). The point being is that I have an ideal structure and a population that evolves towards the ideal structure. I have implemented everything however I would like to add a feature where I can get the "number of buckets" ie the k most representative structures in the population at each generation. I was thinking of using the k-means algorithm but I am not sure how to use it with strings. I found scipy.cluster.vq but I don't know how to

How to show total number in same coordinate in R Programming

做~自己de王妃 提交于 2019-12-02 16:56:28
问题 (update 11/09/2017 question) this is my codes to cluster kmodes in R: library(klaR) setwd("D:/kmodes") data.to.cluster <- read.csv('kmodes.csv', header = TRUE, sep = ';') cluster.results <- kmodes(data.to.cluster[,2:5], 3, iter.max = 10, weighted = FALSE) plot(data.to.cluster[,2:5],col= cluster.results$cluster) the result is like this image : http://imgur.com/a/Y46yJ My sample data : https://drive.google.com/file/d/0B-Z58iD3By5wUzduOXUwUDh1OVU/view Is there a way to show total number in same

How do I find which cluster my data belongs to using Python?

[亡魂溺海] 提交于 2019-12-02 16:09:34
问题 I just ran PCA and then K-means Clustering algorithm on my data, after running the algorithm I get 3 clusters. I am trying to figure out which clusters my input belongs to , in order to gather some qualitative attributes about the input. My input is customer ID and the variables I used for clustering were the spend patterns on certain products Below is the code I ran for K means, looking for some inputs on how to map this back to the source data to see which cluster the input belongs to :

What makes the distance measure in k-medoid “better” than k-means?

不羁的心 提交于 2019-12-02 14:41:47
I am reading about the difference between k-means clustering and k-medoid clustering. Supposedly there is an advantage to using the pairwise distance measure in the k-medoid algorithm, instead of the more familiar sum of squared Euclidean distance-type metric to evaluate variance that we find with k-means. And apparently this different distance metric somehow reduces noise and outliers. I have seen this claim but I have yet to see any good reasoning as to the mathematics behind this claim. What makes the pairwise distance measure commonly used in k-medoid better? More exactly, how does the

How can I prevent NAN issues?

陌路散爱 提交于 2019-12-02 14:36:20
问题 I'm getting Mean of empty slice runtime warnings. When I print out what my variables are (numpy arrays), several of them contain nan values. The Runtime Warning is looking at line 58 as the issue. What can I change to make it work? Sometimes the program will run with no issues. Most times it does not. This is a K-Means from scratch algorithm that is clustering the iris data set. It first prompts the users for the amount of centroids they want (clusters). It then randomly generates said number

TypeError: object of type 'map' has no len() Python3

依然范特西╮ 提交于 2019-12-02 13:13:53
I'm trying to implement KMeans algorithm using Pyspark it gives me the above error in the last line of the while loop. it works fine outside the loop but after I created the loop it gave me this error How do I fix this ? # Find K Means of Loudacre device status locations # # Input data: file(s) with device status data (delimited by '|') # including latitude (13th field) and longitude (14th field) of device locations # (lat,lon of 0,0 indicates unknown location) # NOTE: Copy to pyspark using %paste # for a point p and an array of points, return the index in the array of the point closest to p

How do I find which cluster my data belongs to using Python?

我怕爱的太早我们不能终老 提交于 2019-12-02 12:31:25
I just ran PCA and then K-means Clustering algorithm on my data, after running the algorithm I get 3 clusters. I am trying to figure out which clusters my input belongs to , in order to gather some qualitative attributes about the input. My input is customer ID and the variables I used for clustering were the spend patterns on certain products Below is the code I ran for K means, looking for some inputs on how to map this back to the source data to see which cluster the input belongs to : kmeans= KMeans(n_clusters=3) X_clustered=kmeans.fit_predict(x_10d) LABEL_COLOR_MAP = {0:'r', 1 : 'g' ,2 :