grid-search

Invalid parameter for sklearn estimator pipeline

本秂侑毒 提交于 2019-12-03 09:18:58
问题 I am implementing an example from the O'Reilly book " Introduction to Machine Learning with Python ", using Python 2.7 and sklearn 0.16. The code I am using: pipe = make_pipeline(TfidfVectorizer(), LogisticRegression()) param_grid = {"logisticregression_C": [0.001, 0.01, 0.1, 1, 10, 100], "tfidfvectorizer_ngram_range": [(1,1), (1,2), (1,3)]} grid = GridSearchCV(pipe, param_grid, cv=5) grid.fit(X_train, y_train) print("Best cross-validation score: {:.2f}".format(grid.best_score_)) The error

Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

微笑、不失礼 提交于 2019-12-03 06:55:17
I am fairly new to sci-kit learn and have been trying to hyper-paramater tune XGBoost. My aim is to use early stopping and grid search to tune the model parameters and use early stopping to control the number of trees and avoid overfitting. As I am using cross validation for the grid search, I was hoping to also use cross-validation in the early stopping criteria. The code I have so far looks like this: import numpy as np import pandas as pd from sklearn import model_selection import xgboost as xgb #Import training and test data train = pd.read_csv("train.csv").fillna(value=-999.0) test = pd

How to pass elegantly Sklearn's GridseachCV's best parameters to another model?

感情迁移 提交于 2019-12-03 05:52:49
I have found a set of best hyperparameters for my KNN estimator with Grid Search CV: >>> knn_gridsearch_model.best_params_ {'algorithm': 'auto', 'metric': 'manhattan', 'n_neighbors': 3} So far, so good. I want to train my final estimator with these new-found parameters. Is there a way to feed the above hyperparameter dict to it directly? I tried this: >>> new_knn_model = KNeighborsClassifier(knn_gridsearch_model.best_params_) but instead the hoped result new_knn_model just got the whole dict as the first parameter of the model and left the remaining ones as default: >>> knn_model

How to implement SMOTE in cross validation and GridSearchCV

左心房为你撑大大i 提交于 2019-12-03 03:18:39
I'm relatively new to Python. Can you help me improve my implementation of SMOTE to a proper pipeline? What I want is to apply the over and under sampling on the training set of every k-fold iteration so that the model is trained on a balanced data set and evaluated on the imbalanced left out piece. The problem is that when I do that I cannot use the familiar sklearn interface for evaluation and grid search. Is it possible to make something similar to model_selection.RandomizedSearchCV . My take on this: df = pd.read_csv("Imbalanced_data.csv") #Load the data set X = df.iloc[:,0:64] X = X

Invalid parameter for sklearn estimator pipeline

被刻印的时光 ゝ 提交于 2019-12-02 22:10:39
I am implementing an example from the O'Reilly book " Introduction to Machine Learning with Python ", using Python 2.7 and sklearn 0.16. The code I am using: pipe = make_pipeline(TfidfVectorizer(), LogisticRegression()) param_grid = {"logisticregression_C": [0.001, 0.01, 0.1, 1, 10, 100], "tfidfvectorizer_ngram_range": [(1,1), (1,2), (1,3)]} grid = GridSearchCV(pipe, param_grid, cv=5) grid.fit(X_train, y_train) print("Best cross-validation score: {:.2f}".format(grid.best_score_)) The error being returned boils down to: ValueError: Invalid parameter logisticregression_C for estimator Pipeline

Early stopping with Keras and sklearn GridSearchCV cross-validation

房东的猫 提交于 2019-12-02 19:27:54
I wish to implement early stopping with Keras and sklean's GridSearchCV . The working code example below is modified from How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras . The data set may be downloaded from here . The modification adds the Keras EarlyStopping callback class to prevent over-fitting. For this to be effective it requires the monitor='val_acc' argument for monitoring validation accuracy. For val_acc to be available KerasClassifier requires the validation_split=0.1 to generate validation accuracy, else EarlyStopping raises RuntimeWarning: Early

GridSearch on Model and Classifiers

谁说我不能喝 提交于 2019-12-02 08:32:11
问题 I just came across this example on Model Grid Selection here: https://chrisalbon.com/machine_learning/model_selection/model_selection_using_grid_search/ Question: The example reads # Create a pipeline pipe = Pipeline([('classifier', RandomForestClassifier())]) # Create space of candidate learning algorithms and their hyperparameters search_space = [{'classifier': [LogisticRegression()], 'classifier__penalty': ['l1', 'l2'], 'classifier__C': np.logspace(0, 4, 10)}, {'classifier':

TypeError grid seach

不羁的心 提交于 2019-12-02 04:27:13
I used to create loop for finding the best parameters for my model which increased my errors in coding so I decided to use GridSearchCV . I am trying to find out the best parameters for PCA for my model (the only parameter I want to grid search on). In this model, after normalization I want to combine the original features with the PCA reduced features and then apply the linear SVM. Then I save the whole model to predict my input on. I have an error in the line where I try to fit the data so I can use best_estimator_ and best_params_ functions. The error says: TypeError: The score function

How to access Scikit Learn nested cross-validation scores

ぃ、小莉子 提交于 2019-12-01 22:27:05
I'm using python and I would like to use nested cross-validation with scikit learn. I have found a very good example : NUM_TRIALS = 30 non_nested_scores = np.zeros(NUM_TRIALS) nested_scores = np.zeros(NUM_TRIALS) # Choose cross-validation techniques for the inner and outer loops, # independently of the dataset. # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) outer_cv = KFold(n_splits=4, shuffle=True, random_state=i) # Non_nested parameter search and scoring clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)

Are the k-fold cross-validation scores from scikit-learn's `cross_val_score` and `GridsearchCV` biased if we include transformers in the pipeline?

无人久伴 提交于 2019-12-01 18:44:52
问题 Data pre-processers such as StandardScaler should be used to fit_transform the train set and only transform (not fit) the test set. I expect the same fit/transform process applies to cross-validation for tuning the model. However, I found cross_val_score and GridSearchCV fit_transform the entire train set with the preprocessor (rather than fit_transform the inner_train set, and transform the inner_validation set). I believe this artificially removes the variance from the inner_validation set