grid-search | 易学教程

Perform feature selection using pipeline and gridsearch

阅读更多关于 Perform feature selection using pipeline and gridsearch

问题 As part of a research project, I want to select the best combination of preprocessing techniques and textual features that optimize the results of a text classification task. For this, I am using Python 3.6. There are a number of methods to combine features and algorithms, but I want to take full advantage of sklearn's pipelines and test all the different (valid) possibilities using grid search for the ultimate feature combo. My first step was to build a pipeline that looks like the following

ValueError: continuous is not supported

阅读更多关于 ValueError: continuous is not supported

问题 I am using GridSearchCV for cross validation of a linear regression (not a classifier nor a logistic regression). I also use StandardScaler for normalization of X My dataframe has 17 features (X) and 5 targets (y) (observations). Around 1150 rows I keep getting ValueError: continuous is not supported error message and ran out of options. here is some code (assume all imports are done properly): soilM = pd.read_csv('C:/training.csv', index_col=0) soilM = getDummiedSoilDepth(soilM) #transform

Random Forest with GridSearchCV - Error on param_grid

阅读更多关于 Random Forest with GridSearchCV - Error on param_grid

问题 Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()" . I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline. Here is the code: from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, f1_score,

Random Forest with GridSearchCV - Error on param_grid

阅读更多关于 Random Forest with GridSearchCV - Error on param_grid

Random Forest with GridSearchCV - Error on param_grid

阅读更多关于 Random Forest with GridSearchCV - Error on param_grid

Using GridSearchCV with AdaBoost and DecisionTreeClassifier

阅读更多关于 Using GridSearchCV with AdaBoost and DecisionTreeClassifier

问题 I am attempting to tune an AdaBoost Classifier ("ABT") using a DecisionTreeClassifier ("DTC") as the base_estimator. I would like to tune both ABT and DTC parameters simultaneously, but am not sure how to accomplish this - pipeline shouldn't work, as I am not "piping" the output of DTC to ABT. The idea would be to iterate hyper parameters for ABT and DTC in the GridSearchCV estimator. How can I specify the tuning parameters correctly? I tried the following, which generated an error below. [IN

Grid Search for Keras with multiple inputs

阅读更多关于 Grid Search for Keras with multiple inputs

问题 I am trying to do a grid search over my hyperparameters for tuning a deep learning architecture. I have multiple input options to the model and I am trying to use sklearn's grid search api. The problem is, grid search api only takes single array as input and the code fails while it checks for the data size dimension.(My input dimension is 5*number of data points while according to sklearn api, it should be number of data points*feature dimension). My code looks something like this: from keras

Get individual models and customized score in GridSearchCV and RandomizedSearchCV [duplicate]

阅读更多关于 Get individual models and customized score in GridSearchCV and RandomizedSearchCV [duplicate]

问题 This question already has an answer here : Retrieving specific classifiers and data from GridSearchCV (1 answer) Closed 3 days ago . GridSearchCV and RandomizedSearchCV has best_estimator_ that : Returns only the best estimator/model Find the best estimator via one of the simple scoring methods : accuracy, recall, precision, etc. Evaluate based on training sets only I would like to enrich those limitations with My own definition of scoring methods Evaluate further on test set rather than

Python - LightGBM with GridSearchCV, is running forever

阅读更多关于 Python - LightGBM with GridSearchCV, is running forever

问题 Recently, I am doing multiple experiments to compare Python XgBoost and LightGBM. It seems that this LightGBM is a new algorithm that people say it works better than XGBoost in both speed and accuracy. This is LightGBM GitHub. This is LightGBM python API documents, here you will find python functions you can call. It can be directly called from LightGBM model and also can be called by LightGBM scikit-learn. This is the XGBoost Python API I use. As you can see, it has very similar data

Why we should call split() function during passing StratifiedKFold() as a parameter of GridSearchCV?

阅读更多关于 Why we should call split() function during passing StratifiedKFold() as a parameter of GridSearchCV?

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 12 days ago . What I am trying to do? I am trying to use StratifiedKFold() in GridSearchCV() . Then, what does confuse me? When we use K Fold Cross Validation, we just pass the number of CV inside GridSearchCV() like the following. grid_search_m = GridSearchCV(rdm_forest_clf, param_grid, cv=5, scoring='f1', return_train_score=True, n_jobs=2) Then, when I will need to use StratifiedKFold() ,