Why is cross_val_predict so much slower than fit for KNeighborsClassifier?

我们两清 提交于 2021-02-06 12:55:23

问题


Running locally on a Jupyter notebook and using the MNIST dataset (28k entries, 28x28 pixels per image, the following takes 27 seconds.

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier(n_jobs=1)
knn_clf.fit(pixels, labels)

However, the following takes 1722 seconds, in other words ~64 times longer:

from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(knn_clf, pixels, labels, cv = 3, n_jobs=1)

My naive understanding is that cross_val_predict with cv=3 is doing 3-fold cross validation, so I'd expect it to fit the model 3 times, and so take at least ~3 times longer, but I don't see why it would take 64x!

To check if it was something specific to my environment, I ran the same in a Colab notebook - the difference was less extreme (15x), but still way above the ~3x I expected:

What am I missing? Why is cross_val_predict so much slower than just fitting the model?

In case it matters, I'm running scikit-learn 0.20.2.


回答1:


KNN is also called as lazy algorithm because during fitting it does nothing but saves the input data, specifically there is no learning at all.

During predict is the actual distance calculation happens for each test datapoint. Hence, you could understand that when using cross_val_predict, KNN has to predict on the validation data points, which makes the computation time higher!




回答2:


cross_val_predict does a fit and a predict so it might take longer than just fitting, but I did not expect 64 times longer



来源:https://stackoverflow.com/questions/54304970/why-is-cross-val-predict-so-much-slower-than-fit-for-kneighborsclassifier

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!