Nested cross-validation in grid search for precomputed kernels in scikit-learn

爱⌒轻易说出口 提交于 2019-12-22 09:28:05

问题


I have a precomputed kernel of size NxN. I am using GridSearchCV to tune C parameter of SVM with kernel='precomputed' as follows:

C_range = 10. ** np.arange(-2, 9)
param_grid = dict(C=C_range)
grid = GridSearchCV(SVC(kernel='precomputed'), param_grid=param_grid, cv=StratifiedKFold(y=data_label, n_folds=10))
grid.fit(kernel, data_label)
print grid.best_score_

This works pretty fine, however since I use the full data for prediction (with grid.predict(kernel)), it overfits (I get precision/recall = 1.0 most of the times).

So I would like to first split my data to 10 chunks (9 for training, 1 for testing) with cross-validation, and in each fold, I want to run GridSearch to tune the C value on the training set, and test on the testing set.

In order to do this, I sliced the kernel matrix into 100x100 and 50x50 submatrices where I run grid.fit() on one of them and grid.predict() on the other.

But I get the following error:

ValueError: X.shape[1] = 50 should be equal to 100, the number of features at training time

I guess training kernel should have the same dimension as testing kernel, but I don't understand why, because I simply compute np.dot(X, X.T) for 100x100, and for 50x50, hence the final kernel have different dimensions..


回答1:


The scikit learn doc says:

Set kernel='precomputed' and pass the Gram matrix instead of X in the fit method. At the moment, the kernel values between all training vectors and the test vectors must be provided.

So I guess that it's not possible to do (simple) cross-validation with precomputed kernels.



来源:https://stackoverflow.com/questions/24595874/nested-cross-validation-in-grid-search-for-precomputed-kernels-in-scikit-learn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!