knn

《机器学习实战》--KNN

旧城冷巷雨未停 提交于 2019-12-11 23:09:10
代码来自《机器学习实战》 https://github.com/wzy6642/Machine-Learning-in-Action-Python3 K-近邻算法(KNN) 介绍 简单地说,k-近邻算法采用测量不同特征值之间的距离方法进行分类。 优点:精度高、 对异常值不敏感 ,无数据输入假定。 缺点:计算复杂度高、空间复杂度高,无法给出数据的内在含义。 使用数据范围:数值型、标称型。 分类函数的伪代码:   对未知类别属性的数据集中的每个点依次执行以下操作:   (1)计算已知类别数据集中的点与当前点之间的距离;   (2)按照距离递增次序排序;   (3)选取与当前点距离最小的k个点;   (4)确定前k个点所在类别的出现概率;   (5)返回前k个点出现频率最高的类别作为当前点的预测分类。 1 """创建数据集 2 返回: group - 数据集 3 labels - 分类标签 4 """ 5 def createDataSet(): 6 # 四组二维特征 7 group = np.array([[1, 101], [5, 89], [108, 5], [115, 8]]) 8 # 四组特征的标签 9 labels = ['爱情片', '爱情片', '动作片', '动作片'] 10 return group, labels 11 12 13 """ 14 KNN算法,分类器

Classification accuracy based on single Feature set

五迷三道 提交于 2019-12-11 18:32:58
问题 I am trying to classify data based on prespecified labels. Got two columns and shown below: room_class room_cluster Standard single sea view Standard Deluxe twin Single Deluxe Suite Superior room ocean view Suite Superior Double twin Superior Deluxe Double room Deluxe As seen above room_cluster in the set of labels. The code snippet is as follows: le = preprocessing.LabelEncoder() datar = df #### Separate data into feature and Labels x = datar.room_class y = datar.room_cluster #### Using

机器学习——KNN最近邻算法

蓝咒 提交于 2019-12-11 18:11:53
K近邻(K Nearest Neighbor,KNN),可以做分类,也可以做回归。 一、基本思想 给定一组训练集,有一个需要判断类别的输入实例,离输入实例最近的K个训练数据属于哪个类别,就判断输入实例属于哪个类别。 二、分类算法描述: 1、计算输入实例和所有训练集数据的距离; 2、按距离升序排序; 3、选择排序后的前K个训练子集数据; 4、根据选择出来的K个训练子集数据的类别,使用判别规则(一般是多数投票),预测输入实例的类别。 这样实现也叫蛮力算法,适合样本量少的时候使用。 三、影响因素: 根据以上描述,我们可以归纳影响KNN的主要因素:1、距离的度量 2、K值 3、判别规则。下面具体说下这3个因素是怎么影响KNN的。 距离的度量 计算2个n维数据点的距离公式,闵可夫斯基距离是最一般的形式: 其中, l 是数据点的特征,因为数据点是n维的,所以 l 能从1取到n,数据点 xi 和 xj 之间的距离就等于, xi 和 xj 每个特征相减的绝对值的p次方,求和后开 p次根号。 一般,p 取1、2、∞。 P = 1,叫曼哈顿距离。 P = 2,叫欧式距离。 P = ∞,切比雪夫距离,注意一点的是这个距离的含义,他表示特征差的最大值,不再是所有特征差,只取最大的那个。至于为什么是这样,我还没搞懂。 当然还有更多的求距离的公式,以后学了再添加。 值域变化大的特征会对距离产生较大的影响

knn implementation in 3d space for n closest neighbours

穿精又带淫゛_ 提交于 2019-12-11 15:18:06
问题 I am newbie to c. I have n structs holding the 4 members, 1st the unique index of and three floats representing special coordinates in 3D space. I need to find k nearest struct according to Euclidian distances. //struct for input csv data struct oxygen_coordinates { unsigned int index; //index of an atom //x,y and z coordinates of atom float x; float y; float z; }; struct oxygen_coordinates atom_data[n]; //I need to write a function something like, knn(atom_data[i], atom_data, k); // This

Hadoop - a reducer is not being initiated

老子叫甜甜 提交于 2019-12-11 13:42:56
问题 I am trying to run open source kNN join MapReduce hbrj algorithm on a Hadoop 2.6.0 for single node cluster - pseudo-distributed operation installed on my laptop (OSX). This is the code. Mapper, reducer and the main driver: public class RPhase2 extends Configured implements Tool { public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, RPhase2Value> { public void map(LongWritable key, Text value, OutputCollector<IntWritable, RPhase2Value> output,

Unknown label type: 'continuous'

余生颓废 提交于 2019-12-11 09:13:15
问题 My fellow Team, Having an issue ---------------------- Avg.SessionLength TimeonApp TimeonWebsite LengthofMembership Yearly Amount Spent 0 34.497268 12.655651 39.577668 4.082621 587.951054 1 31.926272 11.109461 37.268959 2.664034 392.204933 2 33.000915 11.330278 37.110597 4.104543 487.547505 3 34.305557 13.717514 36.721283 3.120179 581.852344 4 33.330673 12.795189 37.536653 4.446308 599.406092 5 33.871038 12.026925 34.476878 5.493507 637.102448 6 32.021596 11.366348 36.683776 4.685017 521

'Multiclass-multioutput is not supported' Error in Scikit learn for Knn classifier

佐手、 提交于 2019-12-11 09:04:45
问题 I have two variables X and Y. The structure of X (i.e an np.array): [[26777 24918 26821 ... -1 -1 -1] [26777 26831 26832 ... -1 -1 -1] [26777 24918 26821 ... -1 -1 -1] ... [26811 26832 26813 ... -1 -1 -1] [26830 26831 26832 ... -1 -1 -1] [26830 26831 26832 ... -1 -1 -1]] The structure of Y : [[1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [25197, 26777, 26781], [25197, 26777, 26781], [25197, 26777, 26781],

Unknown label type: continuous

旧街凉风 提交于 2019-12-11 05:29:08
问题 Avg.SessionLength TimeonApp TimeonWebsite LengthofMembership Yearly Amount Spent 0 34.497268 12.655651 39.577668 4.082621 587.951054 1 31.926272 11.109461 37.268959 2.664034 392.204933 2 33.000915 11.330278 37.110597 4.104543 487.547505 3 34.305557 13.717514 36.721283 3.120179 581.852344 4 33.330673 12.795189 37.536653 4.446308 599.406092 5 33.871038 12.026925 34.476878 5.493507 637.102448 6 32.021596 11.366348 36.683776 4.685017 521.572175 I want to apply KNN: X = df[['Avg. Session Length',

K-Nearest Neighbour Implementation in Java [closed]

≯℡__Kan透↙ 提交于 2019-12-10 14:25:23
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I'm looking for a decent implementation of KNN algorithm in java because, in my dissertation, I have to modify it using different data structures. Thanks in advance! 回答1: Here is full implementation and

Function reads np.array - produces the mean for k nn to number p in np.array

喜夏-厌秋 提交于 2019-12-10 12:24:20
问题 I need to defina a function which reads a numpy array and produces the mean for k nearest points to number p in the array. Example: array= np.array([1, 2, 3, 4, 5, 6, 7, 50, 24, 32, 9, 11, 12, 10]) p= 15 (**Note this is not a number in the array, I will need to find the number closest to p or p number itself) k = 3 In this case, I would need to generate the mean for ([11, 12, 10)] as they are closest to p = 15 With the above numbers, I will need to find the mean for k number of points closest