k-means

Yellowbrick Module NotFoundError in Python

偶尔善良 提交于 2021-02-11 15:08:42
问题 I am trying to using Yellowbrick to make an elbow plot.(to make the k-means clustering) I have installed Yellowbrick in jupyter notebook. but, it keeps returning the error message like below. The error message and information are attached as pictures below. I would be very happy if you could help me. from yellowbrick.cluster import KElbowVisualizer model = KMeans() visualizer = KElbowVisualizer(model, k=(1,250)) visualizer.fit(x.reshape(-1,1)) ModuleNotFoundError Traceback (most recent call

partially define initial centroid for scikit-learn K-Means clustering

孤街醉人 提交于 2021-02-08 10:57:22
问题 Scikit documentation states that: Method for initialization: ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. My data has 10 (predicted) clusters and 7 features. However, I would like to pass array of 10 by 6 shape, i.e. I want 6 dimensions of centroid of be predefined by me, but 7th

partially define initial centroid for scikit-learn K-Means clustering

偶尔善良 提交于 2021-02-08 10:56:40
问题 Scikit documentation states that: Method for initialization: ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. My data has 10 (predicted) clusters and 7 features. However, I would like to pass array of 10 by 6 shape, i.e. I want 6 dimensions of centroid of be predefined by me, but 7th

partially define initial centroid for scikit-learn K-Means clustering

谁都会走 提交于 2021-02-08 10:56:35
问题 Scikit documentation states that: Method for initialization: ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. My data has 10 (predicted) clusters and 7 features. However, I would like to pass array of 10 by 6 shape, i.e. I want 6 dimensions of centroid of be predefined by me, but 7th

Faster Kmeans Clustering on High-dimensional Data with GPU Support

落爺英雄遲暮 提交于 2021-02-08 05:16:37
问题 We've been using Kmeans for clustering our logs. A typical dataset has 10 mill. samples with 100k+ features. To find the optimal k - we run multiple Kmeans in parallel and pick the one with the best silhouette score. In 90% of the cases we end up with k between 2 and 100. Currently, we are using scikit-learn Kmeans. For such a dataset, clustering takes around 24h on ec2 instance with 32 cores and 244 RAM. I've been currently researching for a faster solution. What I have already tested:

Faster Kmeans Clustering on High-dimensional Data with GPU Support

Deadly 提交于 2021-02-08 05:15:31
问题 We've been using Kmeans for clustering our logs. A typical dataset has 10 mill. samples with 100k+ features. To find the optimal k - we run multiple Kmeans in parallel and pick the one with the best silhouette score. In 90% of the cases we end up with k between 2 and 100. Currently, we are using scikit-learn Kmeans. For such a dataset, clustering takes around 24h on ec2 instance with 32 cores and 244 RAM. I've been currently researching for a faster solution. What I have already tested:

Drawing boundary lines based on kmeans cluster centres

 ̄綄美尐妖づ 提交于 2021-02-07 09:47:35
问题 I'm quite new to scikit learn, but wanted to try an interesting project. I have longitude and latitudes for points in the UK, which I used to create cluster centers using scikit learns KMeans class. To visualise this data, rather than having the points as clusters, I wanted to instead draw boundaries around each cluster. For example, if one cluster was London and the other Oxford, I currently have a point at the center of each city, but I was wondering if there's a way to use this data to

Drawing boundary lines based on kmeans cluster centres

ぃ、小莉子 提交于 2021-02-07 09:47:08
问题 I'm quite new to scikit learn, but wanted to try an interesting project. I have longitude and latitudes for points in the UK, which I used to create cluster centers using scikit learns KMeans class. To visualise this data, rather than having the points as clusters, I wanted to instead draw boundaries around each cluster. For example, if one cluster was London and the other Oxford, I currently have a point at the center of each city, but I was wondering if there's a way to use this data to

Output 50 samples closest to each cluster center using scikit-learn.k-means library

徘徊边缘 提交于 2021-02-07 06:28:17
问题 I have fitted a k-means algorithm on 5000+ samples using the python scikit-learn library. I want to have the 50 samples closest to a cluster center as an output. How do I perform this task? 回答1: If km is the k-means model, the distance to the j 'th centroid for each point in an array X is d = km.transform(X)[:, j] This gives an array of len(X) distances. The indices of the 50 closest to centroid j are ind = np.argsort(d)[::-1][:50] so the 50 points closest to the centroids are X[ind] (or use

OpenCV - How to apply Kmeans on a grayscale image?

a 夏天 提交于 2021-01-29 18:37:40
问题 I am trying to cluster a grayscale image using Kmeans. First, I have a question: Is Kmeans the best way to cluster a Mat or are there newer more efficient approaches? Second, when I try this: Mat degrees = imread("an image" , IMREAD_GRAYSCALE); const unsigned int singleLineSize = degrees.rows * degrees.cols; Mat data = degrees.reshape(1, singleLineSize); data.convertTo(data, CV_32F); std::vector<int> labels; cv::Mat1f colors; cv::kmeans(data, 3, labels, cv::TermCriteria(cv::TermCriteria::EPS