knn | 易学教程

Use Euclidean distance in SURF

阅读更多关于 Use Euclidean distance in SURF

In my code I'm filtering the good images based on the nearest neigbour distance ratio, as follows: for(int i = 0; i < min(des_image.rows-1,(int) matches.size()); i++) { if((matches[i][0].distance < 0.6*(matches[i][1].distance)) && ((int)matches[i].size()<=2 && (int)matches[i].size()>0)) { good_matches.push_back(matches[i][0]); } } Since I'm filtering the good images based on the nearest neighbor distance ratio, do I need to still do Euclidean distance calculation? And I want to know when I use the knnMatch method in FlannBasedMatcher , inside the method do they use the Euclidean distance to

R's caret training errors when y is not a factor

阅读更多关于 R's caret training errors when y is not a factor

I am using R-studio and am using kaggle's forest cover data and keep getting an error when trying to use the knn3 function in caret. here is my code: library(caret) train <- read.csv("C:/data/forest_cover/train.csv", header=T) trainingRows <- createDataPartition(train$Cover_Type, p=0.8, list=F) head(trainingRows) train_train <- train[trainingRows,] train_test <- train[-trainingRows,] knnfit <- knn3(train_train[,-56], train_train$Cover_Type) This last line gives me this in the console: Error in knn3.matrix(x, y = y, k = k, ...) : y must be a factor As the error message states, y must be a

KNN分类算法

阅读更多关于 KNN分类算法

K最近邻(KNN，K-NearestNeighbor)是1967年由Cover T和Hart P提出的一种基本分类与回归方法，它是数据挖掘分类技术中最简单的方法之一，非常容易理解应用。所谓K最近邻，就是K个最近的邻居的意思，说的是每个样本都可以用它最接近的（一般用距离最短表示最接近）K个邻居来代表。如果K个邻居里大多数都属于某一个类别，那么该样本也被划分为这个类别。 KNN算法中所选择的邻居都是已经正确分类的对象，属于懒惰学习，即KNN没有显式的学习过程，没有训练阶段。待收到新样本后直接进行处理。算法描述： 1）计算测试数据与各个训练数据之间的距离； 2）按照距离的递增关系进行排序； 3）选取距离最小的K个点； 4）确定前K个点所在类别的出现频率； 5）返回前K个点中出现频率最高的类别作为测试数据的预测分类. 通常k是不大于20的整数，上限是训练数据集数量n的开方，随着数据集的增大，K的值也要增大。依赖于训练数据集和K的取值，输出结果可能会有不同。所以需要评估算法的正确率，选取产生最小误差率的K：比如我们可以提供已有数据的90%作为训练样本来训练分类器，而使用其余的10%数据去测试分类器，检测错误率是否随着K值的变化而减小。需要注意的是，10%的测试数据应该是随机选择的。 python示例1 ( sklearn包封装了KNN算法 ) import numpy as np

OpenCV KNN加载训练好的模型

阅读更多关于 OpenCV KNN加载训练好的模型

1 #include<iostream> 2 #include <opencv2\opencv.hpp> 3 using namespace cv; 4 using namespace std; 5 #include "test.h" 6 7 int main() 8 { 9 ///********************测试***************************/// 10 Mat test = imread(".\\test\\3.jpg", 0);//截取图像中一个数字 11 Mat bw; 12 threshold(test, bw, 0, 255, CV_THRESH_BINARY); 13 Mat samples = bw.reshape(0, 1); 14 samples.convertTo(samples, CV_32F); 15 // 开始用KNN预测分类，返回识别结果 16 const int K = 4;//testModel->getDefaultK() 17 Ptr<KNearest> Model = StatModel::load<KNearest>("KnnTest.xml"); 18 Mat MatResult(0, 0, CV_32F);//保存测试结果 19 Model->findNearest(samples, K,

OpenCV KNN数字分类

阅读更多关于 OpenCV KNN数字分类

1 #include<iostream> 2 #include <opencv2\opencv.hpp> 3 using namespace cv; 4 using namespace std; 5 #include "test.h" 6 7 int main() 8 { 9 Mat img = imread("1.png"); 10 Mat gray; 11 cvtColor(img, gray, CV_BGR2GRAY); 12 threshold(gray, gray, 0, 255, CV_THRESH_BINARY); 13 // digits.png为2000 * 1000，其中每个数字的大小为20 * 20， 14 // 总共有5000（(2000*1000) / (20*20)）个数字，类型为[0~9]， 15 // [0~9]10个数字每个数字有5000/10 = 500个样本 16 // 对其分割成单个20 * 20的图像并序列化成（转化成一个一维的数组） 17 int side = 20; 18 int m = gray.rows / side; 19 int n = gray.cols / side; 20 Mat data, labels; 21 for (int i = 0; i < m; i++) { 22 23 int offsetRow = i *

KNN-学习笔记

阅读更多关于 KNN-学习笔记

仅供学习使用练习1 # coding:utf-8 # 2019/10/16 16:49 # huihui # ref: import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier iris = datasets.load_iris() X = iris.data y = iris.target print(X, y) X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=2003) clf = KNeighborsClassifier(n_neighbors=3) clf.fit(X_train, y_train) correct = np.count_nonzero((clf.predict(X_test) == y_test) == True) print("准确率：%.3f" % (correct / len(X_test))) 来源： https://www.cnblogs.com/xuehuiping/p/11694975.html

Broadcast Annoy object in Spark (for nearest neighbors)?

阅读更多关于 Broadcast Annoy object in Spark (for nearest neighbors)?

As Spark's mllib doesn't have nearest-neighbors functionality, I'm trying to use Annoy for approximate Nearest Neighbors. I try to broadcast the Annoy object and pass it to workers; however, it does not operate as expected. Below is code for reproducibility (to be run in PySpark). The problem is highlighted in the difference seen when using Annoy with vs without Spark. from annoy import AnnoyIndex import random random.seed(42) f = 40 t = AnnoyIndex(f) # Length of item vector that will be indexed allvectors = [] for i in xrange(20): v = [random.gauss(0, 1) for z in xrange(f)] t.add_item(i, v)

机器学习（6）——逻辑回归

阅读更多关于机器学习（6）——逻辑回归

什么是逻辑回归逻辑回归虽然名字有回归，但解决的是分类问题。逻辑回归既可以看做回归算法，也可以看做是分类算法，通常作为分类算法用，只可以解决二分类问题。 Sigmoid函数： import numpy as np import matplotlib.pyplot as plt def sigmoid(t): return 1 / (1+np.exp(-t)) x=np.linspace(-10,10,500) y=sigmoid(x) plt.plot(x,y) plt.show() 逻辑回归的损失函数推导过程这里就不赘述了，高等数学基本知识。向量化：逻辑回归的向量化梯度： LogisticRegression.py： import numpy as np from .metrics import accuracy_score class LogisticRegression: def __init__(self): """初始化Logistic Regression模型""" self.coef_ = None self.intercept_ = None self._theta = None def _sigmoid(self, t): return 1. / (1. + np.exp(-t)) def fit(self, X_train, y_train, eta=0

机器学习：KNN

阅读更多关于机器学习：KNN

KNN：K-nearst neighbors 简介： k-近邻算法采用测量不同特征值之间的距离来进行分类，简而言之为：人以类聚，物以群分 KNN既可以应用于分类中，也可用于回归中；在分类的预测是，一般采用多数表决法；在做回归预测时，一般采用平均值法 KNN三要素：在KNN的算法中，主要考虑以下三个要素： K值的选择：表示样本可由距离其最近的K个邻居来代替；可由交叉验证来选择最适合K值当K值较小的时候，表示使用较小领域的样本进行预测，因此会导致模型更加复杂，导致过拟合；当K值较大的时候，表示使用较大领域的样本进行预测，训练误差会增大，模型会简化，容易导致欠拟合距离的度量：一般使用欧式距离；欧式距离：若 $a(a_1,a_2,a_3)$ , $b(b_1,b_2,b_3)$ ，则两者的欧式距离为： \[ \sqrt{(a1-b1)^2+(a2-b2)^2+(a2-b2)^2} \] 决策规则：在分类模型中，主要使用多数表决或者加权多数表决法；在回归模型中，主要使用平均值法或者加权平均值法多数表决/均值法：每个邻近样本权重相同；加权多数表决/加权平均值法：每个邻近样本权重不同；一般情况下，采用权重和距离成反比的方式进行计算 KNN算法实现：蛮力实现(brute) ：计算预测样本到所有训练集样本的距离

Increasing n_jobs has no effect on GridSearchCV

阅读更多关于 Increasing n_jobs has no effect on GridSearchCV

I have setup simple experiment to check importance of the multi core CPU while running sklearn GridSearchCV with KNeighborsClassifier . The results I got are surprising to me and I wonder if I misunderstood the benefits of multi cores or maybe I haven't done it right. There is no difference in time to completion between 2-8 jobs. How come ? I have noticed the difference on a CPU Performance tab. While the first cell was running CPU usage was ~13% and it was gradually increasing to 100% for the last cell. I was expecting it to finish faster. Maybe not linearly faster aka 8 jobs would be 2 times

订阅 knn