knn

机器学习基础--kNN算法

匿名 (未验证) 提交于 2019-12-02 23:39:01
一、kNN算法介绍 首先先介绍下kNN算法,有这么一个场景给定了肿瘤块的大小和时间的一组数据,其中每一组数据对应一个结果数据,即恶性还是良性。这么一组数据: raw_data_x = [[3.39,2.33], #特征 [3.11,1.78], [1.34,3.37], [3.58,4.68], [2.28,2.87], [7.42,4.7], [5.75,3.53], [9.17,2.51], [7.79,3.42], [7.94,0.79] ] raw_data_y=[0,0,0,0,0,1,1,1,1,1] #0良性 1恶性 我们将肿瘤快大小作为横轴,时间作为纵轴,其中绿色代表良性,红色代表恶性,蓝色为给定[8.09,3.37] 需要我们判断是恶性还是良性。 import numpy as np import matplotlib . pyplot as plt raw_data_x = [[ 3.39 , 2.33 ], # 特征 [ 3.11 , 1.78 ], [ 1.34 , 3.37 ], [ 3.58 , 4.68 ], [ 2.28 , 2.87 ], [ 7.42 , 4.7 ], [ 5.75 , 3.53 ], [ 9.17 , 2.51 ], [ 7.79 , 3.42 ], [ 7.94 , 0.79 ] ] raw_data_y =[ 0 , 0 ,

机器学习算法KNN算法实现(一)

匿名 (未验证) 提交于 2019-12-02 23:32:01
KNN算法笔记(一) 1.np.tile函数的用法 array = np.array([1, 2]) 1.1 np.tile(arrray, 2) 将array在行方向重复2次 1.2 np.tile(arrray, (2, 1)) 将array在行方向重复2次, 列方向重复1次 1.3 np.tile(arrray, (2, 2)) 将array在行方向重复2次, 列方向重复2次 text = np.array([1, 2]) text_1 = np.array([1,6,3,5,9]) diffMat_1 = np.tile(text, (1, 1)) diffMat_2 = np.tile(text, (2, 1)) diffMat_3 = np.tile(text, (1, 2)) diffMat_4 = np.tile(text, (2, 2)) 输出: [[1 2]] ―――――――― [[1 2] [1 2]] ―――――――――― [[1 2 1 2]] ―――――――― [[1 2 1 2] [1 2 1 2]] 2.shape[]函数 shape[0]:获得矩阵的行数 shape[1]:获得矩阵的列数 3.argsort()函数 3.1 先定义一个array数据 import numpy as np x=np.array([1,4,3,-1,6,9]) 3.2

KNN算法

匿名 (未验证) 提交于 2019-12-02 22:51:30
一、KNN算法简述   K近邻算法(kNN,k-NearestNeighbor)分类技术中最简单的方法之一。所谓K最近邻,就是k个最近的邻居的意思,说的是每个样本都可以用它最接近的k个邻居来代表。   K近邻算法的2个关键要素是:已标注样本量及其可靠性、距离。   样本要求:   对于给定的已标注的样本,理想状态下,我们认为:样本各类别在给定维度上可分。当一个样本,在其最近的k个样本中充斥了过多其它类别的样本时,会导致k近邻算法的准确率大大降低。这就要求各类别之间的样本量大小不宜过度失衡,且其分布应具有明显的可分性。   距离要求:   计算距离的方式,包括欧式距离、马氏距离。。。,必须要统一各个维度上的量纲。   算法流程   给定已知带分类标签的样本,以及待分类的未知样本 二、KNN算法实现   1.python3实现KNN算法   LoadDataSet类用于从指定网站上下载数据集(如果有变则不可用);Normalizer类用于归一化处理数据集;Plot类用于绘制三维散点图;KNN类用于实现k-近邻算法。 import urllib, bs4import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3Dclass

R - convert from categorical to numeric for KNN

孤人 提交于 2019-12-02 21:58:00
问题 I'm trying to use the Caret package of R to use the KNN applied to the "abalone" database from UCI Machine Learning (link to the data). But it doesn't allow to use KNN when there's categorical values. How do I convert the categorical values (in this database: "M","F","I" ) to numeric values, such as 1,2,3 , respectively? 回答1: When data are read in via read.table , the data in the first column are factors. Then data$iGender = as.integer(data$Gender) would work. If they are character, a detour

简单实现KNN(处理连续型数据)

霸气de小男生 提交于 2019-12-02 21:31:12
import numpy as np import matplotlib.pyplot as plt import time import math import collections raw_data_x = [[3.39,2.33], [3.11,1.78], [1.34,3.36], [3.58,4.67], [2.28,2.86], [7.442,4.69], [5.74,3.53], [9.17,2.51], [7.79,3.42], [7.93,0.79] ] raw_data_y = [0,0,0,0,0,1,1,1,1,1] x_train = np.array(raw_data_x) y_train = np.array(raw_data_y) x_test = np.array([8.0,3.36]) plt.scatter(x_train[y_train == 0,0],x_train[y_train == 0,1],color = 'r') plt.scatter(x_train[y_train == 1,0],x_train[y_train == 1,1],color = 'g') plt.scatter(x_test[0],x_test[1],color = 'b') plt.show() #compute the Euclidean distance

How to use opencv flann::Index?

橙三吉。 提交于 2019-12-02 19:35:05
I have some problems with opencv flann::Index - I'm creating index Mat samples = Mat::zeros(vfv_net_quie.size(),24,CV_32F); for (int i =0; i < vfv_net_quie.size();i++) { for (int j = 0;j<24;j++) { samples.at<float>(i,j)=(float)vfv_net_quie[i].vfv[j]; } } cv::flann::Index flann_index( samples, cv::flann::KDTreeIndexParams(4), cvflann::FLANN_DIST_EUCLIDEAN ); flann_index.save("c:\\index.fln"); A fter that I'm tryin to load it and find nearest neiborhoods cv::flann::Index flann_index(Mat(), cv::flann::SavedIndexParams("c:\\index.fln"), cvflann::FLANN_DIST_EUCLIDEAN ); cv::Mat resps(vfv_reg_quie

K Nearest-Neighbor Algorithm

笑着哭i 提交于 2019-12-02 19:09:21
Maybe I'm rather stupid but I just can't find a satisfying answer: Using the KNN-algorithm, say k=5. Now I try to classify an unknown object by getting its 5 nearest neighbours. What to do, if after determining the 4 nearest neighbors, the next 2 (or more) nearest objects have the same distance? Which object of these 2 or more should be chosen as the 5th nearest neighbor? Thanks in advance :) Which object of these 2 or more should be chosen as the 5th nearest neighbor? It really depends on how you want to implement it. Most algorithms will do one of three things: Include all equal distance

python爬虫---散点图和KNN预测

若如初见. 提交于 2019-12-02 19:04:06
散点图和KNN预测 一丶案例引入 # 城市气候与海洋的关系研究 # 导包 import numpy as np import pandas as pd from pandas import Series,DataFrame import matplotlib.pyplot as plt %matplotlib inline # 使用画图模块时,jupyter工具需要声明 from pylab import mpl # mpl 提供画图的包 mpl.rcParams['font.sans-serif'] = ['FangSong'] # 指定默认字体 mpl.rcParams['axes.unicode_minus'] = False # 解决保存图像是负号'-'显示为方块的问题 # 导入数据 ferrara1 = pd.read_csv('./ferrara_150715.csv') ferrara2 = pd.read_csv('./ferrara_250715.csv') ferrara3 = pd.read_csv('./ferrara_270615.csv') # 拼接数据,忽略索引 ferrara=pd.concat([ferrara1,ferrara2,ferrara3],ignore_index=True) # 去除没用的列 faenza.head() city

KNN train() in cv2 with opencv 3.0

元气小坏坏 提交于 2019-12-02 18:14:48
I'm trying to run k-nearest neighbours using cv2 (python 2.7) and opencv 3.0. I've replicated the same error message using code like http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_ml/py_knn/py_knn_understanding/py_knn_understanding.html : import cv2 import numpy as np import matplotlib.pyplot as plt # Feature set containing (x,y) values of 25 known/training data trainData = np.random.randint(0,100,(25,2)).astype(np.float32) # Labels each one either Red or Blue with numbers 0 and 1 responses = np.random.randint(0,2,(25,1)).astype(np.float32) # Take Red families and plot them red =

K nearest neighbour in python [closed]

时光毁灭记忆、已成空白 提交于 2019-12-02 16:24:58
I would like to calculate K-nearest neighbour in python. what library should i use? Sandro Munda I think that you should use scikit ann . There is a good tutorial about the nearest neightbour here . According to the documentation : ann is a SWIG-generated python wrapper for the Approximate Nearest Neighbor (ANN) Library ( http://www.cs.umd.edu/~mount/ANN/ ), developed by David M. Mount and Sunil Arya. ann provides an immutable kdtree implementation (via ANN) which can perform k-nearest neighbor and approximate k I wrote a script to compare FLANN and scipy.spatial.cKDTree, couldn't get the ANN