Why is KNN slow with custom metric?

问题

I work with data set consists about 200k objects. Every object has 4 features. I classifies them by K nearest neighbors (KNN) with euclidean metric. Process is finished during about 20 seconds.

Lately I've got a reason to use custom metric. Probably it will make better results. I've implemented custom metric and KNN has become to work more than one hour. I didn't wait for finishing of it.

I assumed that a reason of this issue is my metric. I replace my code by return 1. KNN still worked more than one hour. I assumed that a reason is algorithm Ball Tree, but KNN with it and euclidean metric works during about 20 seconds.

Right now I have no idea what's wrong. I use Python 3 and sklearn 0.17.1. Here process can't be finished with custom metric. I also tried algorithm brute but it has same effect. Upgrade and downgrade of scikit-learn have no effect. Implementing classification by custom metric on Python 2 has no positive effect too. I implemented this metric (just return 1) on Cython, it has same effect.

def custom_metric(x: np.ndarray, y: np.ndarray) -> float:
    return 1

clf = KNeighborsClassifier(n_jobs=1, metric=custom_metric)
clf.fit(X, Y)

Can I boost classification process by KNN with custom metric?

Sorry if my english is not clear.

回答1:

Sklearn is optimized and use cython and several process to run as fast as possible. Writing pure python code especially when it is called several times is the cause that slows your code. I recommend that you write your custom metric using cython. You have a tutorial that you can follow right here : https://blog.sicara.com/https-medium-com-redaboumahdi-speed-sklearn-algorithms-custom-metrics-using-cython-de92e5a325c

回答2:

As rightly pointed by @Réda Boumahdi the cause is using custom metric defined in python. This is a known issue discussed here. It was closed as "wontfix" at the end of the discussion. So, only solution suggested is writing your custom metric in cython to avoid GIL that slows down in case of using python metric.

来源：https://stackoverflow.com/questions/40287236/why-is-knn-slow-with-custom-metric

标签

python

algorithm

scikit-learn