knn | 易学教程

[机器学习笔记]kNN进邻算法

阅读更多关于 [机器学习笔记]kNN进邻算法

K-近邻算法一、算法概述采用测量不同特征值之间的距离方法进行分类优点：精度高、对异常值不敏感、无数据输入假定。缺点：计算复杂度高、空间复杂度高。二、实施kNN算法 2.1 伪代码计算法已经类别数据集中的点与当前点之间的距离按照距离递增次序排序选取与但前点距离最小的k个点确定前k个点所在类别的出现频率返回前k个点出现频率最高的类别作为当前点的预测分类 2.2 实际代码 def classify0(inX, dataSet, labels, k): dataSetSize = dataSet.shape[0] diffMat = tile(inX, (dataSetSize,1)) - dataSet sqDiffMat = diffMat**2 sqDistances = sqDiffMat.sum(axis=1) distances = sqDistances**0.5 sortedDistIndicies = distances.argsort() classCount={} for i in range(k): voteIlabel = labels[sortedDistIndicies[i]] classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1 sortedClassCount =

MNIST | 基于k-means和KNN的0-9数字手写体识别

阅读更多关于 MNIST | 基于k-means和KNN的0-9数字手写体识别

MNIST | 基于k-means和KNN的0-9数字手写体识别 1 背景说明 2 算法原理 3 代码实现 3.1 文件目录 3.2 核心代码 4 实验与结果分析 5 后记概要：本实验是在实验“ kaggle|基于k-means和KNN的语音性别识别 ”、实验“ MNIST|基于朴素贝叶斯分类器的0-9数字手写体识别 ”以及实验“ 算法|k-means聚类 ”的基础上进行的，把k-means聚类和CNN识别应用到数字手写体识别问题中去。有关MINIST数据集和kmeans+KNN的内容可以先看我的上面三篇博文，本实验的代码依然是MATLAB。关键字：数字手写体识别; k-means; KNN; MATLAB; 机器学习 1 背景说明我在我的上上篇博文中提到会把kmeans聚类算法用到诸如语音性别识别和0-9数字手写体识别等具体问题中去，语音性别识别的实验已经在11月2号完成，现在来填0-9数字手写体识别的坑。由于本篇博客承接了我之前若干篇博客，而MNIST数据集、kmeans以及KNN算法的原理和用法等内容均已在之前提到过，所以这里不再专门说明。 2 算法原理可以将本次实验思路概括如下： S1：训练时，将训练集中0-9对应的数据各聚成k类，共计10k个聚类中心； S2：验证时

A--最近邻分类器-KNN

阅读更多关于 A--最近邻分类器-KNN

#导入必要的包 import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt %matplotlib inline 构建一个KNN分类器假设输入数据集最后一列是标签，其余列是特征列，训练集是train 测试集是test 二者分开输入，传入格式均为DF In [6]: def classify0_1(train,test,k):#train数据集 test 测试集 k=k值 n = train.shape[1] - 1 m = test.shape[0] result = [] for i in range(m): #利用广播计算测试集每一行分别对训练集求距离得到Series 并转换成list赋值给dist dist = list(((train.iloc[:, :n] - test.iloc[i, :n]) **2).sum(1)) #得到距离数值与标签列生成的DataFram dist_l = pd.DataFrame({'dist': dist, 'labels': (train.iloc[:,n])}) #按照距离排序（默认升序）截取前K行 dr = dist_l.sort_values(by = 'dist')[: k]

Implementing KNN with different distance metrics using R

阅读更多关于 Implementing KNN with different distance metrics using R

问题 I am working on a dataset in order to compare the effect of different distance metrics. I am using the KNN algorithm. The KNN algorithm in R uses the Euclidian distance by default. So I wrote my own one. I would like to find the number of correct class label matches between the nearest neighbor and target. I have prepared the data at first. Then I called the data ( wdbc_n ), I chose K=1. I have used Euclidian distance as a test. library(philentropy) knn <- function(xmat, k,method){ n <- nrow

Knn Regression in R

阅读更多关于 Knn Regression in R

问题 I am investigating Knn regression methods and later Kernel Smoothing. I wish to demonstrate these methods using plots in R. I have generated a data set using the following code: x = runif(100,0,pi) e = rnorm(100,0,0.1) y = sin(x)+e I have been trying to follow a description of how to use "knn.reg" in 9.2 here: https://daviddalpiaz.github.io/r4sl/k-nearest-neighbors.html#regression grid2=data.frame(x) knn10 = FNN::knn.reg(train = x, test = grid2, y = y, k = 10) My predicted values seem

How to best implement K-nearest neighbours in C# for large number of dimensions?

阅读更多关于 How to best implement K-nearest neighbours in C# for large number of dimensions?

问题 I'm implementing the K-nearest neighbours classification algorithm in C# for a training and testing set of about 20,000 samples each, and 25 dimensions. There are only two classes, represented by '0' and '1' in my implementation. For now, I have the following simple implementation : // testSamples and trainSamples consists of about 20k vectors each with 25 dimensions // trainClasses contains 0 or 1 signifying the corresponding class for each sample in trainSamples static int[] TestKnnCase

find all points within a range to any point of an other set

阅读更多关于 find all points within a range to any point of an other set

问题 I have two sets of points A and B . I want to find all points in B that are within a certain range r to A , where a point b in B is said to be within range r to A if there is at least one point a in A whose (Euclidean) distance to b is equal or smaller to r. Each of the both sets of points is a coherent set of points. They are generated from the voxel locations of two non overlapping objects. In 1D this problem fairly easy: all points of B within [min( A )- r max( A )+ r ] But I am in 3D.

【机器学习】机器学习入门03 - 数据归一化

阅读更多关于【机器学习】机器学习入门03 - 数据归一化

1. 数据归一化 1.1 医疗事故？ ——之前的kNN算法哪里出了问题？在之前讲kNN算法时我们举过的肿瘤的例子中，有一个问题，也许很多读者没有考虑过。回顾一下，kNN算法的第一步是求最为邻近的k个点，也就是要先求每个数据点与待预测的数据点的距离。我们仍然以p=2的明可夫斯基距离（欧拉距离）为例。肿瘤的实例中，数据点的两个坐标值是发现时间和肿瘤大小，我们所要求的其实就是这样一个表达式的值并进行大小比较。为了后续表达简单，我们将上式简写如下：好了，新的病人来了，做个检查吧。哔~~~ 肿瘤直径：140mm 发现时间：0.8年嗯，是时候检验一下我们kNN算法的功力了。简单点，我们假设原本的数据点只有2个，k＝1。来看一下原本的两个数据点：肿瘤1 肿瘤直径：150mm 发现时间：1年肿瘤2 肿瘤直径：139mm 发现时间：5年好吧，你聪明的，告诉我，你选1还是选2？虽然我不懂医学，数据也都是我编的，我也不知道这样的直径和时间是否合理。但是，同样不懂医学的你，我相信和我一样，肯定选1嘛。肿瘤1和这个新肿瘤差了两个多月，长大了10个毫米，讲道理应该已经十分相似咯。肿瘤2多长了4个年头还不如这新肿瘤大，肯定不能选嘛。好吧，姑且认为你和我达成了共识，anyway，我们亲手打造的kNN算法不这么觉得。算距离嘛，我们也会。我们来看看kNN算法会发生什么。纳尼？D 2 更近？

How to plot a ROC curve for a knn model

阅读更多关于 How to plot a ROC curve for a knn model

I am using ROCR package and i was wondering how can one plot a ROC curve for knn model in R? Is there any way to plot it all with this package? I don't know how to use the prediction function of ROCR for knn. Here's my example, i am using isolet dataset from UCI repository where i renamed the class attribute as y: cl<-factor(isolet_training$y) knn_isolet<-knn(isolet_training, isolet_testing, cl, k=2, prob=TRUE) Now my question is, what are the arguments to pass to the prediction function of ROC. I tried the 2 below alternatives which are not working: library(ROCR) pred_knn<-prediction(knn

R : knnImputation Giving Error

阅读更多关于 R : knnImputation Giving Error

Getting below error in R coding. in my Brand_X.xlsx dataset, there are few NA values which I am trying to compute using KNN imputation but I am getting below error. whats wrong here? Thanks! > library(readxl) > Brand_X <- read_excel("Brand_X.xlsx") > str(Brand_X) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 101 obs. of 8 variables: $ Rel_price_lag5: num 108 111 105 103 109 104 110 114 103 108 ... $ Rel_price_lag1: num 110 109 217 241 855 271 234 297 271 999 ... $ Rel_Price : num 122 110 109 217 241 855 271 234 297 271 ... $ Promo : num 74 29 32 24 16 31 22 7 32 22 ... $ Loy_HH : num 37 26 35 30

订阅 knn