knn

KNN算法和实现

本秂侑毒 提交于 2019-12-03 21:26:26
KNN要用到欧氏距离 KNN下面的缺点很容易使分类出错(比如下面黑色的点) 下面是KNN算法的三个例子demo, 第一个例子是根据算法原理实现 import matplotlib.pyplot as plt import numpy as np import operator # 已知分类的数据 x1 = np.array([3,2,1]) y1 = np.array([104,100,81]) x2 = np.array([101,99,98]) y2 = np.array([10,5,2]) scatter1 = plt.scatter(x1,y1,c='r') scatter2 = plt.scatter(x2,y2,c='b') # 未知数据 x = np.array([18]) y = np.array([90]) scatter3 = plt.scatter(x,y,c='k') #画图例 plt.legend(handles=[scatter1,scatter2,scatter3],labels=['labelA','labelB','X'],loc='best') plt.show() # 已知分类的数据 x_data = np.array([[3,104], [2,100], [1,81], [101,10], [99,5], [81,2]]) y_data =

Implementing KNN with different distance metrics using R

谁说我不能喝 提交于 2019-12-03 20:26:42
I am working on a dataset in order to compare the effect of different distance metrics. I am using the KNN algorithm. The KNN algorithm in R uses the Euclidian distance by default. So I wrote my own one. I would like to find the number of correct class label matches between the nearest neighbor and target. I have prepared the data at first. Then I called the data ( wdbc_n ), I chose K=1. I have used Euclidian distance as a test. library(philentropy) knn <- function(xmat, k,method){ n <- nrow(xmat) if (n <= k) stop("k can not be more than n-1") neigh <- matrix(0, nrow = n, ncol = k) for(i in 1

Knn Regression in R

倖福魔咒の 提交于 2019-12-03 20:23:32
I am investigating Knn regression methods and later Kernel Smoothing. I wish to demonstrate these methods using plots in R. I have generated a data set using the following code: x = runif(100,0,pi) e = rnorm(100,0,0.1) y = sin(x)+e I have been trying to follow a description of how to use "knn.reg" in 9.2 here: https://daviddalpiaz.github.io/r4sl/k-nearest-neighbors.html#regression grid2=data.frame(x) knn10 = FNN::knn.reg(train = x, test = grid2, y = y, k = 10) My predicted values seem reasonable to me but when I try to plot a line with them on top of my x~y plot I don't get what I'm hoping for

How to best implement K-nearest neighbours in C# for large number of dimensions?

蓝咒 提交于 2019-12-03 16:16:33
I'm implementing the K-nearest neighbours classification algorithm in C# for a training and testing set of about 20,000 samples each, and 25 dimensions. There are only two classes, represented by '0' and '1' in my implementation. For now, I have the following simple implementation : // testSamples and trainSamples consists of about 20k vectors each with 25 dimensions // trainClasses contains 0 or 1 signifying the corresponding class for each sample in trainSamples static int[] TestKnnCase(IList<double[]> trainSamples, IList<double[]> testSamples, IList<int[]> trainClasses, int K) { Console

using k-NN in R with categorical values

元气小坏坏 提交于 2019-12-03 15:45:22
I'm looking to perform classification on data with mostly categorical features. For that purpose, Euclidean distance (or any other numerical assuming distance) doesn't fit. I'm looking for a kNN implementation for [R] where it is possible to select different distance methods, like Hamming distance. Is there a way to use common kNN implementations like the one in {class} with different distance metric functions? I'm using R 2.15 As long as you can calculate a distance/dissimilarity matrix (in whatever way you like) you can easily perform kNN classification without the need of any special

k-近邻(KNN) 算法预测签到位置

北战南征 提交于 2019-12-03 10:04:27
分类算法-k近邻算法(KNN): 定义:    如果一个样本在特征空间中的 k个最相似 (即特征空间中最邻近) 的样本中的大多数属于某一个类别 ,则该样本也属于这个类别 来源:    KNN算法最早是由Cover和Hart提出的一种分类算法 计算距离公式:    两个样本的距离可以通过如下公式计算,又叫 欧氏距离 ,比如说       sklearn k-近邻算法API: 问题: 1. k值取多大?有什么影响?   k值取很小:容易受到异常点的影响   k值取很大:容易受最近数据太多导致比例变化 2. 性能问题 k-近邻算法的优缺点:    优点:     简单、易于理解, 无需估计参数,无需训练    缺点:     懒惰算法,对测试样本分类时的计算量大,内存开销大     必须制定k值,k值选择不当则分类精度不能保证    使用场景:     小数据场景,几千~几万样本,具体场景具体业务去测试 k近邻算法实例-预测签到位置: 数据来源:   kaggle官网,链接地址: https://www.kaggle.com/c/facebook-v-predicting-check-ins/data (需官网登录后下载) 数据的处理: 1. 缩小数值范围: DataFrame.query(),因为数据量过大,所以获取部分数据 2. 处理日期数据: pd.to_datetime()

How to get the most contributing feature in any classifier Sklearn for example DecisionTreeClassifier knn etc

浪尽此生 提交于 2019-12-03 09:04:56
I have tried my model on a data set using KNN classifier , I would like to know which is the most contributing feature in the model, and most contributing feature in the prediction. To gain qualitative insight into which feature has greater impact on classification you could perform n_feats classifications using one single feature at a time ( n_feats stands for the feature vector dimension), like this: import numpy as np from sklearn import datasets from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import cross_val_score iris = datasets.load_iris() clf =

How to plot a ROC curve for a knn model

匿名 (未验证) 提交于 2019-12-03 08:28:06
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am using ROCR package and i was wondering how can one plot a ROC curve for knn model in R? Is there any way to plot it all with this package? I don't know how to use the prediction function of ROCR for knn. Here's my example, i am using isolet dataset from UCI repository where i renamed the class attribute as y: cl<-factor(isolet_training$y) knn_isolet<-knn(isolet_training, isolet_testing, cl, k=2, prob=TRUE) Now my question is, what are the arguments to pass to the prediction function of ROC. I tried the 2 below alternatives which are not

Pre-processing before digit recognition with KNN classifier

淺唱寂寞╮ 提交于 2019-12-03 06:42:07
Right now I'm trying to create digit recognition system using OpenCV. There are many articles and examples in WEB (and even on StackOverflow ). I decided to use KNN classifier because this solution is the most popular in WEB. I found a database of handwritten digits with a training set of 60k examples and with error rate less than 5%. I used this tutorial as an example of how to work with this database using OpenCV. I'm using exactly same technique and on test data ( t10k-images.idx3-ubyte ) I've got 4% error rate. But when I try to classify my own digits I've got much bigger error. For

K Nearest-Neighbor Algorithm

烈酒焚心 提交于 2019-12-03 05:49:05
问题 Maybe I'm rather stupid but I just can't find a satisfying answer: Using the KNN-algorithm, say k=5. Now I try to classify an unknown object by getting its 5 nearest neighbours. What to do, if after determining the 4 nearest neighbors, the next 2 (or more) nearest objects have the same distance? Which object of these 2 or more should be chosen as the 5th nearest neighbor? Thanks in advance :) 回答1: Which object of these 2 or more should be chosen as the 5th nearest neighbor? It really depends