cross-validation

KFolds Cross Validation vs train_test_split

限于喜欢 提交于 2020-06-25 04:06:52
问题 I just built my first random forest classifier today and I am trying to improve its performance. I was reading about how cross-validation is important to avoid overfitting of data and hence obtain better results. I implemented StratifiedKFold using sklearn , however, surprisingly this approach resulted to be less accurate. I have read numerous posts suggesting that cross-validating is much more efficient than train_test_split . Estimator: rf = RandomForestClassifier(n_estimators=100, random

Why we should call split() function during passing StratifiedKFold() as a parameter of GridSearchCV?

醉酒当歌 提交于 2020-06-16 05:55:26
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 12 days ago . What I am trying to do? I am trying to use StratifiedKFold() in GridSearchCV() . Then, what does confuse me? When we use K Fold Cross Validation, we just pass the number of CV inside GridSearchCV() like the following. grid_search_m = GridSearchCV(rdm_forest_clf, param_grid, cv=5, scoring='f1', return_train_score=True, n_jobs=2) Then, when I will need to use StratifiedKFold() ,

KFold Cross Validation does not fix overfitting

和自甴很熟 提交于 2020-06-16 03:34:09
问题 I am separating the features in X and y then I preprocess my train test data after splitting it with k fold cross validation. After that i fit the train data to my Random Forest Regressor model and calculate the confidence score. Why do i preprocess after splitting? because people tell me that it's more correct to do it that way and i'm keeping that principle since that for the sake of my model performance. This is my first time using KFold Cross Validation because my model score overifts and

How to standardize data with sklearn's cross_val_score()

与世无争的帅哥 提交于 2020-06-13 20:06:49
问题 Let's say I want to use a LinearSVC to perform k-fold-cross-validation on a dataset. How would I perform standardization on the data? The best practice I have read is to build your standardization model on your training data then apply this model to the testing data. When one uses a simple train_test_split(), this is easy as we can just do: X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) clf = svm.LinearSVC() scalar = StandardScaler() X_train = scalar.fit_transform(X

How to standardize data with sklearn's cross_val_score()

我只是一个虾纸丫 提交于 2020-06-13 20:03:31
问题 Let's say I want to use a LinearSVC to perform k-fold-cross-validation on a dataset. How would I perform standardization on the data? The best practice I have read is to build your standardization model on your training data then apply this model to the testing data. When one uses a simple train_test_split(), this is easy as we can just do: X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) clf = svm.LinearSVC() scalar = StandardScaler() X_train = scalar.fit_transform(X

how to plot a decision tree from gridsearchcv?

核能气质少年 提交于 2020-06-13 06:59:29
问题 i was trying to plot the decision tree which is formed with GridSearchCV, but its giving me an Attribute error. AttributeError: 'GridSearchCV' object has no attribute 'n_features_' However if i try to plot a normal decision tree without GridSearchCv, then it successfully prints. code [decision tree without gridsearchcv] # dtc_entropy : decison tree classifier based on entropy/information Gain #plotting : decision tree on information/entropy based from sklearn.tree import export_graphviz

how to plot a decision tree from gridsearchcv?

天大地大妈咪最大 提交于 2020-06-13 06:59:08
问题 i was trying to plot the decision tree which is formed with GridSearchCV, but its giving me an Attribute error. AttributeError: 'GridSearchCV' object has no attribute 'n_features_' However if i try to plot a normal decision tree without GridSearchCv, then it successfully prints. code [decision tree without gridsearchcv] # dtc_entropy : decison tree classifier based on entropy/information Gain #plotting : decision tree on information/entropy based from sklearn.tree import export_graphviz

Cross_val_score is not working with roc_auc and multiclass

那年仲夏 提交于 2020-05-28 06:53:08
问题 What I want to do: I wish to compute a cross_val_score using roc_auc on a multiclass problem What I tried to do: Here is a reproducible example made with iris data set. from sklearn.datasets import load_iris from sklearn.preprocessing import OneHotEncoder from sklearn.model_selection import cross_val_score iris = load_iris() X = pd.DataFrame(data=iris.data, columns=iris.feature_names) I one hot encode my target encoder = OneHotEncoder() y = encoder.fit_transform(pd.DataFrame(iris.target))

How to get training & validation loss of Keras scikit-learn wrapper in cross validation?

孤人 提交于 2020-05-17 05:41:07
问题 I know that model.fit in keras returns a callbacks.History object where we can get loss and other metrics from it as follows. ... train_history = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, Y_test)) loss = train_history.history['loss'] val_loss = train_history.history['val_loss'] However, in my new experimenet I am using cross validation with keras model using kerasclassifier (full example code: https://chrisalbon.com/deep_learning

Cross validation for MNIST dataset with pytorch and sklearn

◇◆丶佛笑我妖孽 提交于 2020-05-15 05:03:09
问题 I am new to pytorch and are trying to implement a feed forward neural network to classify the mnist data set. I have some problems when trying to use cross-validation. My data has the following shapes: x_train : torch.Size([45000, 784]) and y_train : torch.Size([45000]) I tried to use KFold from sklearn. kfold =KFold(n_splits=10) Here is the first part of my train method where I'm dividing the data into folds: for train_index, test_index in kfold.split(x_train, y_train): x_train_fold = x