knn

[机器学习笔记]kNN进邻算法

感情迁移 提交于 2019-12-05 09:22:18
K-近邻算法 一、算法概述 采用测量不同特征值之间的距离方法进行分类 优点: 精度高、对异常值不敏感、无数据输入假定。 缺点: 计算复杂度高、空间复杂度高。 二、实施kNN算法 2.1 伪代码 计算法已经类别数据集中的点与当前点之间的距离 按照距离递增次序排序 选取与但前点距离最小的k个点 确定前k个点所在类别的出现频率 返回前k个点出现频率最高的类别作为当前点的预测分类 2.2 实际代码 def classify0(inX, dataSet, labels, k): dataSetSize = dataSet.shape[0] diffMat = tile(inX, (dataSetSize,1)) - dataSet sqDiffMat = diffMat**2 sqDistances = sqDiffMat.sum(axis=1) distances = sqDistances**0.5 sortedDistIndicies = distances.argsort() classCount={} for i in range(k): voteIlabel = labels[sortedDistIndicies[i]] classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1 sortedClassCount =

MNIST | 基于k-means和KNN的0-9数字手写体识别

匆匆过客 提交于 2019-12-05 09:03:48
MNIST | 基于k-means和KNN的0-9数字手写体识别 1 背景说明 2 算法原理 3 代码实现 3.1 文件目录 3.2 核心代码 4 实验与结果分析 5 后记 概要: 本实验是在实验“ kaggle|基于k-means和KNN的语音性别识别 ”、实验“ MNIST|基于朴素贝叶斯分类器的0-9数字手写体识别 ”以及实验“ 算法|k-means聚类 ”的基础上进行的,把k-means聚类和CNN识别应用到数字手写体识别问题中去。有关MINIST数据集和kmeans+KNN的内容可以先看我的上面三篇博文,本实验的代码依然是MATLAB。 关键字: 数字手写体识别; k-means; KNN; MATLAB; 机器学习 1 背景说明    我在我的 上上篇博文 中提到会把kmeans聚类算法用到诸如语音性别识别和0-9数字手写体识别等具体问题中去, 语音性别识别的实验 已经在11月2号完成,现在来填0-9数字手写体识别的坑。由于本篇博客承接了我之前若干篇博客,而MNIST数据集、kmeans以及KNN算法的原理和用法等内容均已在之前提到过,所以这里不再专门说明。 2 算法原理    可以将本次实验思路概括如下:    S1:训练时,将训练集中0-9对应的数据各聚成k类,共计10k个聚类中心;    S2:验证时

A--最近邻分类器-KNN

佐手、 提交于 2019-12-05 06:35:36
#导入必要的包 import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt %matplotlib inline 构建一个KNN分类器 假设输入数据集最后一列是标签,其余列是特征列,训练集是train 测试集是test 二者分开输入,传入格式均为DF In [6]: def classify0_1(train,test,k):#train数据集 test 测试集 k=k值 n = train.shape[1] - 1 m = test.shape[0] result = [] for i in range(m): #利用广播计算测试集每一行分别对训练集求距离 得到Series 并转换成list赋值给dist dist = list(((train.iloc[:, :n] - test.iloc[i, :n]) **2).sum(1)) #得到距离数值与标签列生成的DataFram dist_l = pd.DataFrame({'dist': dist, 'labels': (train.iloc[:,n])}) #按照距离排序(默认升序)截取前K行 dr = dist_l.sort_values(by = 'dist')[: k]

Implementing KNN with different distance metrics using R

只谈情不闲聊 提交于 2019-12-05 05:21:41
问题 I am working on a dataset in order to compare the effect of different distance metrics. I am using the KNN algorithm. The KNN algorithm in R uses the Euclidian distance by default. So I wrote my own one. I would like to find the number of correct class label matches between the nearest neighbor and target. I have prepared the data at first. Then I called the data ( wdbc_n ), I chose K=1. I have used Euclidian distance as a test. library(philentropy) knn <- function(xmat, k,method){ n <- nrow

Knn Regression in R

好久不见. 提交于 2019-12-05 02:50:14
问题 I am investigating Knn regression methods and later Kernel Smoothing. I wish to demonstrate these methods using plots in R. I have generated a data set using the following code: x = runif(100,0,pi) e = rnorm(100,0,0.1) y = sin(x)+e I have been trying to follow a description of how to use "knn.reg" in 9.2 here: https://daviddalpiaz.github.io/r4sl/k-nearest-neighbors.html#regression grid2=data.frame(x) knn10 = FNN::knn.reg(train = x, test = grid2, y = y, k = 10) My predicted values seem

How to best implement K-nearest neighbours in C# for large number of dimensions?

喜欢而已 提交于 2019-12-05 01:39:30
问题 I'm implementing the K-nearest neighbours classification algorithm in C# for a training and testing set of about 20,000 samples each, and 25 dimensions. There are only two classes, represented by '0' and '1' in my implementation. For now, I have the following simple implementation : // testSamples and trainSamples consists of about 20k vectors each with 25 dimensions // trainClasses contains 0 or 1 signifying the corresponding class for each sample in trainSamples static int[] TestKnnCase

find all points within a range to any point of an other set

旧巷老猫 提交于 2019-12-05 01:09:48
问题 I have two sets of points A and B . I want to find all points in B that are within a certain range r to A , where a point b in B is said to be within range r to A if there is at least one point a in A whose (Euclidean) distance to b is equal or smaller to r. Each of the both sets of points is a coherent set of points. They are generated from the voxel locations of two non overlapping objects. In 1D this problem fairly easy: all points of B within [min( A )- r max( A )+ r ] But I am in 3D.

【机器学习】机器学习入门03 - 数据归一化

我是研究僧i 提交于 2019-12-04 20:39:08
1. 数据归一化 1.1 医疗事故? ——之前的kNN算法哪里出了问题? 在之前讲kNN算法时我们举过的肿瘤的例子中,有一个问题,也许很多读者没有考虑过。 回顾一下,kNN算法的第一步是求最为邻近的k个点,也就是要先求每个数据点与待预测的数据点的距离。我们仍然以p=2的明可夫斯基距离(欧拉距离)为例。肿瘤的实例中,数据点的两个坐标值是发现时间和肿瘤大小,我们所要求的其实就是这样一个表达式的值并进行大小比较。 为了后续表达简单,我们将上式简写如下: 好了,新的病人来了,做个检查吧。 哔~~~ 肿瘤直径:140mm 发现时间:0.8年 嗯,是时候检验一下我们kNN算法的功力了。简单点,我们假设原本的数据点只有2个,k=1。来看一下原本的两个数据点: 肿瘤1 肿瘤直径:150mm 发现时间:1年 肿瘤2 肿瘤直径:139mm 发现时间:5年 好吧,你聪明的,告诉我,你选1还是选2? 虽然我不懂医学,数据也都是我编的,我也不知道这样的直径和时间是否合理。但是,同样不懂医学的你,我相信和我一样,肯定选1嘛。 肿瘤1和这个新肿瘤差了两个多月,长大了10个毫米,讲道理应该已经十分相似咯。肿瘤2多长了4个年头还不如这新肿瘤大,肯定不能选嘛。 好吧,姑且认为你和我达成了共识,anyway,我们亲手打造的kNN算法不这么觉得。 算距离嘛,我们也会。我们来看看kNN算法会发生什么。 纳尼?D 2 更近?

How to plot a ROC curve for a knn model

你。 提交于 2019-12-04 17:02:24
I am using ROCR package and i was wondering how can one plot a ROC curve for knn model in R? Is there any way to plot it all with this package? I don't know how to use the prediction function of ROCR for knn. Here's my example, i am using isolet dataset from UCI repository where i renamed the class attribute as y: cl<-factor(isolet_training$y) knn_isolet<-knn(isolet_training, isolet_testing, cl, k=2, prob=TRUE) Now my question is, what are the arguments to pass to the prediction function of ROC. I tried the 2 below alternatives which are not working: library(ROCR) pred_knn<-prediction(knn

R : knnImputation Giving Error

那年仲夏 提交于 2019-12-04 13:06:22
Getting below error in R coding. in my Brand_X.xlsx dataset, there are few NA values which I am trying to compute using KNN imputation but I am getting below error. whats wrong here? Thanks! > library(readxl) > Brand_X <- read_excel("Brand_X.xlsx") > str(Brand_X) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 101 obs. of 8 variables: $ Rel_price_lag5: num 108 111 105 103 109 104 110 114 103 108 ... $ Rel_price_lag1: num 110 109 217 241 855 271 234 297 271 999 ... $ Rel_Price : num 122 110 109 217 241 855 271 234 297 271 ... $ Promo : num 74 29 32 24 16 31 22 7 32 22 ... $ Loy_HH : num 37 26 35 30