k-means | 易学教程

Inconsistent results with KMeans between Apache Spark and scikit_learn

阅读更多关于 Inconsistent results with KMeans between Apache Spark and scikit_learn

I am performing clustering on a dataset using PySpark. To find the number of clusters I performed clustering over a range of values (2,20) and found the wsse (within-cluster sum of squares) values for each value of k . This where I found something unusual. According to my understanding when you increase the number of clusters, the wsse decreases monotonically. But results I got say otherwise. I 'm displaying wsse for first few clusters only Results from spark For k = 002 WSSE is 255318.793358 For k = 003 WSSE is 209788.479560 For k = 004 WSSE is 208498.351074 For k = 005 WSSE is 142573.272672

Spherical k-means implementation in Python

阅读更多关于 Spherical k-means implementation in Python

问题 I've been using scipy's k-means for quite some time now, and I'm pretty happy about the way it works in terms of usability and efficiency. However, now I want to explore different k-means variants, more specifically, I'd like to apply spherical k-means in some of my problems. Do you know any good Python implementation (i.e. similar to scipy's k-means) of spherical k-means? If not, how hard would it be to modify scipy's source code to adapt its k-means algorithm to be spherical? Thank you. 回答1

机器学习 - 算法 - 聚类 K-MEANS 算法

阅读更多关于机器学习 - 算法 - 聚类 K-MEANS 算法

聚类算法概述无监督问题　　手中无标签聚类　　将相似的东西分到一组难点　　如何评估 , 如何调参基本概念要得到的簇的个数　　- 需要指定 K 值质心　　 - 均值, 即向量各维度取平均距离的度量　　- 常用欧几里得距离和余弦线相似度 ( 先标准化 ) 优化目标　　- 需求每个簇中的点, 到质心的距离尽可能的加和最小, 从而得到最优 K - MEANS 算法工作流程 - (a) 　　初始图 - (b) 　　在指定了 K 值之后, 会在图中初始化两个点红点, 蓝点( 随机质心 ) 这里 K 指定为 2 - (c) 　　然后对图中的每一个点计算是分别到红点以及蓝点的距离, 谁短就算谁的 - (d) 　　重新将红色蓝色区域计算质心 - (e) 　　根据重新计算的质心, 再次遍历所有点计算到两个新质点的距离对比划分 - (f)　　按照之前的套路再次更新质点就这样不断的更新下去, 直到所有的样本点都不再发生变化的时候则表示划分成功优势简单快速, 适合常规数据集劣势 K 值难以决定复杂度与样本呈线性关系很难发现任意形状的簇 , 如下图来源： https://www.cnblogs.com/shijieli/p/11925823.html

Trouble with scipy kmeans and kmeans2 clustering in Python

阅读更多关于 Trouble with scipy kmeans and kmeans2 clustering in Python

I have a question about scipy's kmeans and kmeans2 . I have a set of 1700 lat-long data points. I want to spatially cluster them into 100 clusters. However, I get drastically different results when using kmeans vs kmeans2 . Can you explain why this is? My code is below. First I load my data and plot the coordinates. It all looks correct. import pandas as pd, numpy as np, matplotlib.pyplot as plt from scipy.cluster.vq import kmeans, kmeans2, whiten df = pd.read_csv('data.csv') df.head() coordinates = df.as_matrix(columns=['lon', 'lat']) plt.figure(figsize=(10, 6), dpi=100) plt.scatter

k-means实战-RFM客户价值分群

阅读更多关于 k-means实战-RFM客户价值分群

数据挖掘的十大算法基本概念导入数据集到mysql数据库中总共有940个独立消费数据 K- Means 算法 K-Means 算法是一个聚类算法。你可以这么理解，最终我想把物体划分成 K 类。假设每个类别里面，都有个“中心点”，即意见领袖，它是这个类别的核心。现在我有一个新点要归类，这时候就只要计算这个新点与 K 个中心点的距离，距离哪个中心点近，就变成了哪个类别。引入模块 import pandas as pd import numpy as np from sklearn.cluster import KMeans import pymysql 连接数据库： conn = pymysql.connect(host='localhost',user='root',password='123',db='db2',port=3306) rfm = pd.read_sql('select * from consumption_data',con=conn) conn.close() 查看详情： rfm.info() rfm.head() """选取RFM 三列""" new_rfm = rfm.loc[:,['R','F','M']] """调用KMeans算法进行聚类，设定为8类""" clf = KMeans(n_clusters=8,random_state=0)

Implementation of k-means clustering algorithm

阅读更多关于 Implementation of k-means clustering algorithm

问题 In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop. can anyone please guide me where i'm making a mistake..? for simplicity, i hav taken the input in the program code itself. here is my code : import java.io.*; import java.lang.*; class Kmean { public static void main(String args[]) { int N=9; int arr[]={2,4,10,12,3,20,30,11,25};

MNIST | 基于k-means和KNN的0-9数字手写体识别

阅读更多关于 MNIST | 基于k-means和KNN的0-9数字手写体识别

MNIST | 基于k-means和KNN的0-9数字手写体识别 1 背景说明 2 算法原理 3 代码实现 3.1 文件目录 3.2 核心代码 4 实验与结果分析 5 后记概要：本实验是在实验“ kaggle|基于k-means和KNN的语音性别识别 ”、实验“ MNIST|基于朴素贝叶斯分类器的0-9数字手写体识别 ”以及实验“ 算法|k-means聚类 ”的基础上进行的，把k-means聚类和CNN识别应用到数字手写体识别问题中去。有关MINIST数据集和kmeans+KNN的内容可以先看我的上面三篇博文，本实验的代码依然是MATLAB。关键字：数字手写体识别; k-means; KNN; MATLAB; 机器学习 1 背景说明我在我的上上篇博文中提到会把kmeans聚类算法用到诸如语音性别识别和0-9数字手写体识别等具体问题中去，语音性别识别的实验已经在11月2号完成，现在来填0-9数字手写体识别的坑。由于本篇博客承接了我之前若干篇博客，而MNIST数据集、kmeans以及KNN算法的原理和用法等内容均已在之前提到过，所以这里不再专门说明。 2 算法原理可以将本次实验思路概括如下： S1：训练时，将训练集中0-9对应的数据各聚成k类，共计10k个聚类中心； S2：验证时

OpenCV4Android Kmean doesn't work as expected

阅读更多关于 OpenCV4Android Kmean doesn't work as expected

问题 This code should give centers mat with 3 rows and clusterCount number of columns Mat reshaped_image = imageMat.reshape(1, imageMat.cols()*imageMat.rows()); Mat reshaped_image32f = new Mat(); reshaped_image.convertTo(reshaped_image32f, CvType.CV_32F, 1.0 / 255.0); Mat labels = new Mat(); TermCriteria criteria = new TermCriteria(TermCriteria.COUNT, 100, 1); Mat centers = new Mat(); int clusterCount = 5, attempts = 1; Core.kmeans(reshaped_image32f, clusterCount, labels, criteria, attempts, Core

K means finding elbow when the elbow plot is a smooth curve

阅读更多关于 K means finding elbow when the elbow plot is a smooth curve

I am trying to plot the elbow of k means using the below code: load CSDmat %mydata for k = 2:20 opts = statset('MaxIter', 500, 'Display', 'off'); [IDX1,C1,sumd1,D1] = kmeans(CSDmat,k,'Replicates',5,'options',opts,'distance','correlation');% kmeans matlab [yy,ii] = min(D1'); %% assign points to nearest center distort = 0; distort_across = 0; clear clusts; for nn=1:k I = find(ii==nn); %% indices of points in cluster nn J = find(ii~=nn); %% indices of points not in cluster nn clusts{nn} = I; %% save into clusts cell array if (length(I)>0) mu(nn,:) = mean(CSDmat(I,:)); %% update mean %% Compute

Extract black objects from color background

阅读更多关于 Extract black objects from color background

It is easy for human eyes to tell black from other colors. But how about computers? I printed some color blocks on the normal A4 paper. Since there are three kinds of ink to compose a color image, cyan, magenta and yellow, I set the color of each block C=20%, C=30%, C=40%, C=50% and rest of two colors are 0. That is the first column of my source image. So far, no black ( K of CMYK) ink is supposed to print. After that, I set the color of each dot K=100% and rest colors are 0 to print black dots. You may feel my image is weird and awful. In fact, the image is magnified 30 times and how the ink

订阅 k-means