How is scikit-learn GridSearchCV best_score_ calculated?

后端 未结 1 1882
没有蜡笔的小新
没有蜡笔的小新 2021-01-05 17:06

I\'ve been trying to figure out how is the best_score_ parameter of GridSearchCV is being calculated (or in other words, what does it mean). The documentation says:

相关标签:
1条回答
  • 2021-01-05 17:27

    It's the mean cross-validation score of the best estimator. Let's make some data and fix the cross-validation's division of data.

    >>> y = linspace(-5, 5, 200)
    >>> X = (y + np.random.randn(200)).reshape(-1, 1)
    >>> threefold = list(KFold(len(y)))
    

    Now run cross_val_score and GridSearchCV, both with these fixed folds.

    >>> cross_val_score(LinearRegression(), X, y, cv=threefold)
    array([-0.86060164,  0.2035956 , -0.81309259])
    >>> gs = GridSearchCV(LinearRegression(), {}, cv=threefold, verbose=3).fit(X, y) 
    Fitting 3 folds for each of 1 candidates, totalling 3 fits
    [CV]  ................................................................
    [CV] ...................................... , score=-0.860602 -   0.0s
    [Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.0s
    [CV]  ................................................................
    [CV] ....................................... , score=0.203596 -   0.0s
    [CV]  ................................................................
    [CV] ...................................... , score=-0.813093 -   0.0s
    [Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s finished
    

    Note the score=-0.860602, score=0.203596 and score=-0.813093 in the GridSearchCV output; exactly the values returned by cross_val_score.

    Note that the "mean" is really a macro-average over the folds. The iid parameter to GridSearchCV can be used to get a micro-average over the samples instead.

    0 讨论(0)
提交回复
热议问题