Evaluate multiple scores on sklearn cross_val_score

前端 未结 3 1834
野趣味
野趣味 2020-12-12 18:08

I\'m trying to evaluate multiple machine learning algorithms with sklearn for a couple of metrics (accuracy, recall, precision and maybe more).

For what I understood

相关标签:
3条回答
  • 2020-12-12 18:38

    Since the time of writing this post scikit-learn has updated and made my answer obsolete, see the much cleaner solution below


    You can write your own scoring function to capture all three pieces of information, however a scoring function for cross validation must only return a single number in scikit-learn (this is likely for compatibility reasons). Below is an example where each of the scores for each cross validation slice prints to the console, and the returned value is just the sum of the three metrics. If you want to return all these values, you're going to have to make some changes to cross_val_score (line 1351 of cross_validation.py) and _score (line 1601 or the same file).

    from sklearn.svm import SVC
    from sklearn.naive_bayes import GaussianNB
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.cross_validation import  cross_val_score
    import time
    from sklearn.datasets import  load_iris
    from sklearn.metrics import accuracy_score, precision_score, recall_score
    
    iris = load_iris()
    
    models = [GaussianNB(), DecisionTreeClassifier(), SVC()]
    names = ["Naive Bayes", "Decision Tree", "SVM"]
    
    def getScores(estimator, x, y):
        yPred = estimator.predict(x)
        return (accuracy_score(y, yPred), 
                precision_score(y, yPred, pos_label=3, average='macro'), 
                recall_score(y, yPred, pos_label=3, average='macro'))
    
    def my_scorer(estimator, x, y):
        a, p, r = getScores(estimator, x, y)
        print a, p, r
        return a+p+r
    
    for model, name in zip(models, names):
        print name
        start = time.time()
        m = cross_val_score(model, iris.data, iris.target,scoring=my_scorer, cv=10).mean()
        print '\nSum:',m, '\n\n'
        print 'time', time.time() - start, '\n\n'
    

    Which gives:

    Naive Bayes
    0.933333333333 0.944444444444 0.933333333333
    0.933333333333 0.944444444444 0.933333333333
    1.0 1.0 1.0
    0.933333333333 0.944444444444 0.933333333333
    0.933333333333 0.944444444444 0.933333333333
    0.933333333333 0.944444444444 0.933333333333
    0.866666666667 0.904761904762 0.866666666667
    1.0 1.0 1.0
    1.0 1.0 1.0
    1.0 1.0 1.0
    
    Sum: 2.86936507937 
    
    
    time 0.0249638557434 
    
    
    Decision Tree
    1.0 1.0 1.0
    0.933333333333 0.944444444444 0.933333333333
    1.0 1.0 1.0
    0.933333333333 0.944444444444 0.933333333333
    0.933333333333 0.944444444444 0.933333333333
    0.866666666667 0.866666666667 0.866666666667
    0.933333333333 0.944444444444 0.933333333333
    0.933333333333 0.944444444444 0.933333333333
    1.0 1.0 1.0
    1.0 1.0 1.0
    
    Sum: 2.86555555556 
    
    
    time 0.0237860679626 
    
    
    SVM
    1.0 1.0 1.0
    0.933333333333 0.944444444444 0.933333333333
    1.0 1.0 1.0
    1.0 1.0 1.0
    1.0 1.0 1.0
    0.933333333333 0.944444444444 0.933333333333
    0.933333333333 0.944444444444 0.933333333333
    1.0 1.0 1.0
    1.0 1.0 1.0
    1.0 1.0 1.0
    
    Sum: 2.94333333333 
    
    
    time 0.043044090271 
    

    As of scikit-learn 0.19.0 the solution becomes much easier

    from sklearn.model_selection import cross_validate
    from sklearn.datasets import  load_iris
    from sklearn.svm import SVC
    
    iris = load_iris()
    clf = SVC()
    scoring = {'acc': 'accuracy',
               'prec_macro': 'precision_macro',
               'rec_micro': 'recall_macro'}
    scores = cross_validate(clf, iris.data, iris.target, scoring=scoring,
                             cv=5, return_train_score=True)
    print(scores.keys())
    print(scores['test_acc'])  
    

    Which gives:

    ['test_acc', 'score_time', 'train_acc', 'fit_time', 'test_rec_micro', 'train_rec_micro', 'train_prec_macro', 'test_prec_macro']
    [ 0.96666667  1.          0.96666667  0.96666667  1.        ]
    
    0 讨论(0)
  • 2020-12-12 18:50
    from sklearn import model_selection
    
    def error_metrics(model, train_data, train_targ, kfold):
        scoring = ["accuracy","roc_auc","neg_log_loss","r2",
                 "neg_mean_squared_error","neg_mean_absolute_error"] 
    
        error_metrics = pd.DataFrame()
        error_metrics["model"] = model
        for scor in scoring:
            score = []
            for mod in model:
               
                result = model_selection.cross_val_score(estimator= mod, X=train_data, y=train_targ,cv=kfold,scoring=scor )
                score.append(result.mean())
                
            error_metrics[scor] =pd.Series(score)
            
        return error_metrics
    
    0 讨论(0)
  • 2020-12-12 18:58

    I ran over the same problem and I created a module that can support multiple metrics in cross_val_score.
    In order to accomplish what you want with this module, you can write:

    from multiscorer import MultiScorer
    import numpy as np
    from sklearn.metrics import accuracy_score, precision_score, recall_score          
    from sklearn.model_selection import cross_val_score
    from numpy import average
    
    scorer = MultiScorer({
        'Accuracy'  : (accuracy_score , {}),
        'Precision' : (precision_score, {'pos_label': 3, 'average':'macro'}),
        'Recall'    : (recall_score   , {'pos_label': 3, 'average':'macro'})
    })
    
    for model, name in zip(models, names):
        print name
        start = time.time()
    
        _ = cross_val_score(model, iris.data, iris.target,scoring=scorer, cv=10) # Added assignment of the result to `_` in order to illustrate that the return value will not be used
        results = scorer.get_results()
    
        for metric_name in results.keys():
            average_score = np.average(results[metric_name])
            print('%s : %f' % (metric_name, average_score))
    
        print 'time', time.time() - start, '\n\n'
    

    You can check and download this module from GitHub. Hope it helps.

    0 讨论(0)
提交回复
热议问题