Sklearn kNN usage with a user defined metric

匿名 (未验证) 提交于 2019-12-03 01:55:01

问题:

Currently I'm doing a project which may require using a kNN algorithm to find the top k nearest neighbors for a given point, say P. im using python, sklearn package to do the job, but our predefined metric is not one of those default metrics. so I have to use the user defined metric, from the documents of sklearn, which can be find here and here.

It seems that the latest version of sklearn kNN support the user defined metric, but i cant find how to use it:

import sklearn from sklearn.neighbors import NearestNeighbors import numpy as np from sklearn.neighbors import DistanceMetric from sklearn.neighbors.ball_tree import BallTree BallTree.valid_metrics 

say i have defined a metric called mydist=max(x-y), then use DistanceMetric.get_metric to make it a DistanceMetric object:

dt=DistanceMetric.get_metric('pyfunc',func=mydist) 

from the document, the line should looks like this

nbrs = NearestNeighbors(n_neighbors=4, algorithm='auto',metric='pyfunc').fit(A) distances, indices = nbrs.kneighbors(A) 

but where can i put the dt in? Thanks

回答1:

You pass a metric as metric param, and additional metric arguments as keyword paramethers to NN constructor:

>>> def mydist(x, y): ...     return np.sum((x-y)**2) ... >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])  >>> nbrs = NearestNeighbors(n_neighbors=4, algorithm='ball_tree', ...            metric='pyfunc', func=mydist) >>> nbrs.fit(X) NearestNeighbors(algorithm='ball_tree', leaf_size=30, metric='pyfunc',          n_neighbors=4, radius=1.0) >>> nbrs.kneighbors(X) (array([[  0.,   1.,   5.,   8.],        [  0.,   1.,   2.,  13.],        [  0.,   2.,   5.,  25.],        [  0.,   1.,   5.,   8.],        [  0.,   1.,   2.,  13.],        [  0.,   2.,   5.,  25.]]), array([[0, 1, 2, 3],        [1, 0, 2, 3],        [2, 1, 0, 3],        [3, 4, 5, 0],        [4, 3, 5, 0],        [5, 4, 3, 0]])) 


回答2:

A small addition to the previous answer. How to use a user defined metric that takes additional arguments.

>>> def mydist(x, y, **kwargs): ...     return np.sum((x-y)**kwargs["metric_params"]["power"]) ... >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> Y = np.array([-1, -1, -2, 1, 1, 2]) >>> nbrs = KNeighborsClassifier(n_neighbors=4, algorithm='ball_tree', ...            metric=mydist, metric_params={"power": 2}) >>> nbrs.fit(X, Y) KNeighborsClassifier(algorithm='ball_tree', leaf_size=30,                                                                                                                                                                  metric=, n_neighbors=4, p=2,        weights='uniform') >>> nbrs.kneighbors(X) (array([[  0.,   1.,   5.,   8.],        [  0.,   1.,   2.,  13.],        [  0.,   2.,   5.,  25.],        [  0.,   1.,   5.,   8.],        [  0.,   1.,   2.,  13.],        [  0.,   2.,   5.,  25.]]),  array([[0, 1, 2, 3],        [1, 0, 2, 3],        [2, 1, 0, 3],        [3, 4, 5, 0],        [4, 3, 5, 0],        [5, 4, 3, 0]])) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!