k-means

My own K-means algorithm in R

拥有回忆 提交于 2019-12-07 15:15:22
问题 I am a beginner at R programming and I am doing this exercise in R as an intro to programming. I have made my own K means implementation in R, but have been stuck for a while at a one point: I need to make a consensus, where the algorithm iterates until it finds the optimal center of each cluster. This is the raw algorithm without iteration. It just take a random data point from the whole data as a center, which number is defined by k. Centroid_test=data[sample(nrow(data), k), ] x = Centroid

OpenCV3中的机器学习算法

一世执手 提交于 2019-12-07 13:55:10
OpenCV3中加入了几种机器学习算法,可以将机器学习算法与图像和视频处理结合起来。可参考: OpenCV/OpenCV3计算机视觉软件支持库和最新资源 OpenCV3的最新特征 OpenCV3的人脸检测-使用Python OpenCV3的机器学习算法kNN-使用Python OpenCV3的kNN算法进行OCR识别-使用Python OpenCV3的机器学习算法SVM-使用Python OpenCV3的机器学习算法-K-means-使用Python 来源: oschina 链接: https://my.oschina.net/u/2306127/blog/626538

Clustering Time Series Data of Different Length

人走茶凉 提交于 2019-12-07 13:46:57
问题 I have time series data of different length of series. I want to cluster based upon DTW distance but could not find ant library regarding it. sklearn give straight error while tslearn kmeans gave wrong answer. My problem is solving if I pad it with zeros but I am not sure if this is correct to pad time-series data while clustering. The suggestion about other clustering technique about time series data are welcomed. max_length = 0 for i in train_1: if(len(i)>max_length): max_length = len(i)

Trouble with scipy kmeans and kmeans2 clustering in Python

别说谁变了你拦得住时间么 提交于 2019-12-07 07:47:49
问题 I have a question about scipy's kmeans and kmeans2 . I have a set of 1700 lat-long data points. I want to spatially cluster them into 100 clusters. However, I get drastically different results when using kmeans vs kmeans2 . Can you explain why this is? My code is below. First I load my data and plot the coordinates. It all looks correct. import pandas as pd, numpy as np, matplotlib.pyplot as plt from scipy.cluster.vq import kmeans, kmeans2, whiten df = pd.read_csv('data.csv') df.head()

Inconsistent results with KMeans between Apache Spark and scikit_learn

别来无恙 提交于 2019-12-07 07:02:56
问题 I am performing clustering on a dataset using PySpark. To find the number of clusters I performed clustering over a range of values (2,20) and found the wsse (within-cluster sum of squares) values for each value of k . This where I found something unusual. According to my understanding when you increase the number of clusters, the wsse decreases monotonically. But results I got say otherwise. I 'm displaying wsse for first few clusters only Results from spark For k = 002 WSSE is 255318.793358

sklearn: calculating accuracy score of k-means on the test data set

爷,独闯天下 提交于 2019-12-07 06:05:54
问题 I am doing k-means clustering on the set of 30 samples with 2 clusters (I already know there are two classes). I divide my data into training and test set and try to calculate the accuracy score on my test set. But there are two problems: first I don't know if I can actually do this (accuracy score on test set) for k-means clustering. Second: if I am allowed to do this, whether my implementation is write or wrong. Here is what I've tried: df_hist = pd.read_csv('video_data.csv') y = df_hist[

Document Clustering Basics

丶灬走出姿态 提交于 2019-12-06 14:02:28
问题 So, I've been mulling over these concepts for some time, and my understanding is very basic. Information retrieval seems to be a topic seldom covered in the wild... My questions stem from the process of clustering documents. Let's say I start off with a collection of documents containing only interesting words. What is the first step here? Parse the words from each document and create a giant 'bag-of-words' type model? Do I then proceed to create vectors of word counts for each document? How

Plotting Clusters using clusplot with coordinates centered around 0

…衆ロ難τιáo~ 提交于 2019-12-06 13:35:37
I am trying to plot GIS coordinates, specifically UK national Grid Coordinates which eastings and northings ressemble: 194630000 562220000 I can plot these using clusplot in the Cluster library: clusplot (df2,k.means.fit$cluster,main=i,color=TRUE,shade=FALSE,labels=0,lines=0,bty="7") where df2 is my data frame and k.means.fit is the result of the K means analysis on df2. Note that the coordinates of the centers after the k means analysis have not been normalised: k.means.fit$centers # Grid.Ref.Northing Grid.Ref.Easting #1 206228234 581240726 But when I plot the clusters, all the points are

Using K-means clustering pixel in OpenCV using Java

本秂侑毒 提交于 2019-12-06 13:25:36
I am currently trying to develop an Android app. I have tried to convert an image of a leaf from RBG to HSV to produce an image which is in saturation-value space (without hue). Next, I tried to use K-means to produce a image where it should display blue as background and green for the leaf (foreground object). However, I do not know how to display the image after using the K-means function in OpenCV. Imgproc.cvtColor(rgba, mHSV, Imgproc.COLOR_RGBA2RGB,3); Imgproc.cvtColor(rgba, mHSV, Imgproc.COLOR_RGB2HSV,3); List<Mat> hsv_planes = new ArrayList<Mat>(3); Core.split(mHSV, hsv_planes); Mat

K means clustering in MATLAB - output image

穿精又带淫゛_ 提交于 2019-12-06 11:48:32
To perform K means clustering with k = 3 (segments). So I: 1) Converted the RGB img into grayscale 2) Casted the original image into a n X 1, column matrix 3) idx = kmeans(column_matrix) 4) output = idx, casted back into the same dimensions as the original image. My questions are : A When I do imshow(output), I get a plain white image. However when I do imshow(output[0 5]), it shows the output image. I understand that 0 and 5 specify the display range. But why do I have to do this? B) Now the output image is meant to be split into 3 segments right. How do I threshold it such that I assign a 0