k-means | 易学教程

Retreiving similar images from a set of images using SIFT/SURF

阅读更多关于 Retreiving similar images from a set of images using SIFT/SURF

问题 I am working on SIFT features and 'm using a visual bag-of-words approach to make a vocabulary first and then do the matching. I've found similar questions but didn't find the appropriate answer. Same question is asked in below link but there is no satisfactory answer, can anyone help me. Thank u in advance. https://stackoverflow.com/questions/29366944/finding-top-similar-images-from-a-database-using-sift-surf 回答1: Sift and Surf Method are all implemented in lire project and ready to use.

Getting Database Attribute From KMeans Clustering WEKA

阅读更多关于 Getting Database Attribute From KMeans Clustering WEKA

问题 i have function that create k-means algorithm using WEKA.jar. I have done creating function and showing the list of object in my console. But, i want to show specific attribute from k-means clustering. This is my syntax result: //importing required dependencies import weka.core.Instance; import weka.experiment.InstanceQuery; public class KMeans { /*get connection strings from database manager*/ private DatabaseManager datman = new DatabaseManager(); private String username = datman

Bag of feature: how to create the query histogram?

阅读更多关于 Bag of feature: how to create the query histogram?

问题 I'm trying to implement the Bag of Features model. Given a descriptors matrix object (representing an image) belonging to the initial dataset, compute its histogram is easy, since we already know to which cluster each descriptor vector belongs to from k-means. But what about if we want to compute the histogram of a query matrix? The only solution that crosses my mind is to compute the distance between each vector descriptor to each of the k cluster centroids. This can be inefficient:

聚类分析 Python 自定义K-means函数（学习笔记）

阅读更多关于聚类分析 Python 自定义K-means函数（学习笔记）

from numpy import * import matplotlib . pyplot as plt from math import sqrt #距离度量函数（欧氏距离） def eucDistance ( vec1 , vec2 ) : return sqrt ( sum ( power ( vec2 - vec1 , 2 ) ) ) #初始聚类中心选择 def initCentroids ( dataSet , k ) : numSamples , dim = dataSet . shape centroids = zeros ( ( k , dim ) ) for i in range ( k ) : index = int ( random . uniform ( 0 , numSamples ) ) centroids [ i , : ] = dataSet [ index , : ] return centroids #K-means聚类算法 #创建K个质心，再将每个数据点分配到最近的质心，然后重新计算质心 def kmeanss ( dataSet , k ) : numSamples = dataSet . shape [ 0 ] clusterAssement = mat ( zeros ( ( numSamples , 2 ) ) )

聚类算法之K-Means，K-Means++，elkan K-Means和MiniBatch K-Means算法流程

阅读更多关于聚类算法之K-Means，K-Means++，elkan K-Means和MiniBatch K-Means算法流程

聚类问题是机器学习中无监督学习的典型代表，在数据分析、模式识别的很多实际问题中得到了应用。我们知道，分类问题是机器学习中最常见的一类问题，它的目标是确定一个物体所属的类别。分类问题和聚类问题一个最重要的区别在于分类问题有标签，学习过程实际就是程序不断学习各个标签特点的过程，而聚类问题是一种无监督学习问题，我们事先并不知道这些事物一共多少个类，每个事物的所属类别，我们需要让程序基于一定的规则，自动地将事物分为我们需要的类。我们在进行聚类分析的时候，需要确定无监督学习算法需要决定的三个问题： 1.分成几类？ 2.样本之间的距离度量方式？ 3.聚类策略？下面，我们来看一些常用的聚类算法：一、K-Means K-Means聚类又叫K均值聚类，是一种迭代求解的聚类分析算法，其步骤是随机选取K个对象作为初始的聚类中心，然后计算每个对象与各个种子聚类中心之间的距离，把每个对象分配给距离它最近的聚类中心。聚类中心以及分配给它们的对象就代表一个聚类。每分配一个样本，聚类的聚类中心会根据聚类中现有的对象被重新计算。这个过程将不断重复直到满足某个终止条件。终止条件可以是没有（或最小数目）对象被重新分配给不同的聚类，没有（或最小数目）聚类中心再发生变化，误差平方和局部最小。 K-Means算法过程： 1.输入数据 D = { x 1 , x 2 , x 3 , . . . , x m } D=

How to vectorize json data for KMeans?

阅读更多关于 How to vectorize json data for KMeans?

问题 I have a number of questions and choices which users are going to answer. They have the format like this: question_id, text, choices And for each user I store the answered questions and selected choice by each user as a json in mongodb: {user_id: "", "question_answers" : [{"question_id": "choice_id", ..}] } Now I'm trying to use K-Means clustering and streaming to find most similar users based on their choices of questions but I need to convert my user data to some vector numbers like the

How to vectorize json data for KMeans?

阅读更多关于 How to vectorize json data for KMeans?

Extract black objects from color background

阅读更多关于 Extract black objects from color background

问题 It is easy for human eyes to tell black from other colors. But how about computers? I printed some color blocks on the normal A4 paper. Since there are three kinds of ink to compose a color image, cyan, magenta and yellow, I set the color of each block C=20%, C=30%, C=40%, C=50% and rest of two colors are 0. That is the first column of my source image. So far, no black ( K of CMYK) ink is supposed to print. After that, I set the color of each dot K=100% and rest colors are 0 to print black

How to detect multiple objects with OpenCV in C++?

阅读更多关于 How to detect multiple objects with OpenCV in C++?

问题 I got inspiration from this answer here, which is a Python implementation, but I need C++ , that answer works very well, I got the thought is that: detectAndCompute to get keypoints , use kmeans to segment them to clusters, then for each cluster do matcher->knnMatch with each's descriptors , then do the other stuffs like the common single detecting method. The main problem is, how to provide descriptors for each cluster's matcher->knnMatch process? I thought we should set value of the other

Accurately detect color regions in an image using K-means clustering

阅读更多关于 Accurately detect color regions in an image using K-means clustering

问题 I'm using K-means clustering in color-based image segmentation. I have a 2D image which has 3 colors, black, white, and green. Here is the image, I want K-means to produce 3 clusters, one represents the green color region, the second one represents the white region, and the last one represents the black region. Here is the code I used, %Clustering color regions in an image. %Step 1: read the image using imread, and show it using imshow. img = (imread('img.jpg')); figure, imshow(img), title('X