题目:
1
Create a classification dataset (n samples 1000, n features 10)
2
Split the dataset using 10-fold cross validation
3
Train the algorithms
GaussianNB
SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
RandomForestClassifier (possible n estimators values [10, 100, 1000])
4
Evaluate the cross-validated performance
Accuracy
F1-score
AUC ROC
5
Write a short report summarizing the methodology and the results
from sklearn import datasets,cross_validation from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.ensemble import RandomForestClassifier from sklearn import metrics X, Y = datasets.make_classification(n_samples = 1000, n_features = 10) kf = cross_validation.KFold(1000, n_folds = 10, shuffle = True) acc_for_NB = [] #使用accuracy评估三个算法 acc_for_SVC = [] acc_for_RFC = [] f1_for_NB = [] # 使用F1-score评估三个算法 f1_for_SVC = [] f1_for_RFC = [] auc_for_NB = [] # 使用AUC ROC评估三个算法 auc_for_SVC = [] auc_for_RFC = [] for train_index, test_index in kf: X_train, y_train = X[train_index], Y[train_index] X_test, y_test = X[test_index], Y[test_index] clf = GaussianNB() clf.fit(X_train, y_train) pred = clf.predict(X_test) acc_for_NB.append(metrics.accuracy_score(y_test, pred)) f1_for_NB.append(metrics.f1_score(y_test, pred)) auc_for_NB.append(metrics.roc_auc_score(y_test, pred)) clf = SVC(C=1e00, kernel='rbf', gamma=0.1) clf.fit(X_train, y_train) pred = clf.predict(X_test) acc_for_SVC.append(metrics.accuracy_score(y_test, pred)) f1_for_SVC.append(metrics.f1_score(y_test, pred)) auc_for_SVC.append(metrics.roc_auc_score(y_test, pred)) clf = RandomForestClassifier(n_estimators=100) clf.fit(X_train, y_train) pred = clf.predict(X_test) acc_for_RFC.append(metrics.accuracy_score(y_test, pred)) f1_for_RFC.append(metrics.f1_score(y_test, pred)) auc_for_RFC.append(metrics.roc_auc_score(y_test, pred)) print("Naive Bayes:") print("Evaluated by accuracy score:") print(acc_for_NB) print("Average:", sum(acc_for_NB) / len(acc_for_NB)) print() print("Evaluated by f1 score:") print(f1_for_NB) print("Average:", sum(f1_for_NB) / len(f1_for_NB)) print() print("Evaluated by roc auc score:") print(auc_for_NB) print("Average:", sum(auc_for_NB) / len(auc_for_NB)) print() print("SVC:") print("Evaluated by accuracy score:") print(acc_for_SVC) print("Average:", sum(acc_for_SVC) / len(acc_for_SVC)) print() print("Evaluated by f1 score:") print(f1_for_SVC) print("Average:", sum(f1_for_SVC) / len(f1_for_SVC)) print() print("Evaluated by roc auc score:") print(auc_for_SVC) print("Average:", sum(auc_for_SVC) / len(auc_for_SVC)) print() print("Random Forest:") print("Evaluated by accuracy score:") print(acc_for_RFC) print("Average:", sum(acc_for_RFC) / len(acc_for_RFC)) print() print("Evaluated by f1 score:") print(f1_for_RFC) print("Average:", sum(f1_for_RFC) / len(f1_for_RFC)) print() print("Evaluated by roc auc score:") print(auc_for_RFC) print("Average:", sum(auc_for_RFC) / len(auc_for_RFC)) print()
结果:
Naive Bayes: Evaluated by accuracy score: [0.94, 0.92, 0.88, 0.86, 0.91, 0.91, 0.89, 0.9, 0.83, 0.94] Average: 0.8979999999999999 Evaluated by f1 score: [0.9491525423728813, 0.9245283018867925, 0.8604651162790697, 0.8653846153846154, 0.8988764044943819, 0.9108910891089109, 0.8952380952380952, 0.8913043478260869, 0.8089887640449439, 0.9387755102040817] Average: 0.8943604786839858 Evaluated by roc auc score: [0.9461958806221101, 0.9190705128205129, 0.8747474747474747, 0.8606985146527499, 0.9099025974025975, 0.91, 0.8917069243156199, 0.8993558776167472, 0.8263749498193497, 0.9407051282051283] Average: 0.897875786020229 SVC: Evaluated by accuracy score: [0.96, 0.9, 0.88, 0.88, 0.89, 0.9, 0.89, 0.9, 0.83, 0.93] Average: 0.8959999999999999 Evaluated by f1 score: [0.9661016949152543, 0.9038461538461539, 0.8604651162790697, 0.8823529411764707, 0.8735632183908046, 0.9, 0.8932038834951458, 0.8913043478260869, 0.8045977011494252, 0.9278350515463918] Average: 0.8903270108624802 Evaluated by roc auc score: [0.9672131147540983, 0.8998397435897435, 0.8747474747474747, 0.8819751103974307, 0.8871753246753247, 0.9, 0.8933172302737521, 0.8993558776167472, 0.8251706142111602, 0.9302884615384616] Average: 0.8959082951804194 Random Forest: Evaluated by accuracy score: [0.96, 0.93, 0.93, 0.92, 0.94, 0.9, 0.92, 0.94, 0.94, 0.95] Average: 0.9329999999999998 Evaluated by f1 score: [0.9661016949152543, 0.9345794392523366, 0.9195402298850575, 0.923076923076923, 0.9333333333333332, 0.9, 0.9259259259259259, 0.9333333333333332, 0.9318181818181819, 0.9484536082474228] Average: 0.9316162669787769 Evaluated by roc auc score: [0.9672131147540983, 0.9286858974358975, 0.9262626262626262, 0.920915295062224, 0.9415584415584416, 0.9, 0.9194847020933978, 0.9380032206119163, 0.9361702127659575, 0.9503205128205129] Average: 0.9328614023365072
显然可见RFC评估结果最好,NB和SVC差不多
而ACC评估方法分数最高,可见AUC和F1评估更严格
文章来源: python作业之sklearn