grid-search

Perform feature selection using pipeline and gridsearch

断了今生、忘了曾经 提交于 2020-12-12 11:46:15
问题 As part of a research project, I want to select the best combination of preprocessing techniques and textual features that optimize the results of a text classification task. For this, I am using Python 3.6. There are a number of methods to combine features and algorithms, but I want to take full advantage of sklearn's pipelines and test all the different (valid) possibilities using grid search for the ultimate feature combo. My first step was to build a pipeline that looks like the following

ValueError: continuous is not supported

余生颓废 提交于 2020-11-29 03:37:05
问题 I am using GridSearchCV for cross validation of a linear regression (not a classifier nor a logistic regression). I also use StandardScaler for normalization of X My dataframe has 17 features (X) and 5 targets (y) (observations). Around 1150 rows I keep getting ValueError: continuous is not supported error message and ran out of options. here is some code (assume all imports are done properly): soilM = pd.read_csv('C:/training.csv', index_col=0) soilM = getDummiedSoilDepth(soilM) #transform

Random Forest with GridSearchCV - Error on param_grid

风流意气都作罢 提交于 2020-08-21 01:11:06
问题 Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()" . I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline. Here is the code: from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, f1_score,

Random Forest with GridSearchCV - Error on param_grid

核能气质少年 提交于 2020-08-21 01:08:53
问题 Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()" . I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline. Here is the code: from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, f1_score,

Random Forest with GridSearchCV - Error on param_grid

坚强是说给别人听的谎言 提交于 2020-08-21 01:08:16
问题 Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()" . I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline. Here is the code: from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, f1_score,

Using GridSearchCV with AdaBoost and DecisionTreeClassifier

。_饼干妹妹 提交于 2020-08-20 18:33:29
问题 I am attempting to tune an AdaBoost Classifier ("ABT") using a DecisionTreeClassifier ("DTC") as the base_estimator. I would like to tune both ABT and DTC parameters simultaneously, but am not sure how to accomplish this - pipeline shouldn't work, as I am not "piping" the output of DTC to ABT. The idea would be to iterate hyper parameters for ABT and DTC in the GridSearchCV estimator. How can I specify the tuning parameters correctly? I tried the following, which generated an error below. [IN

Grid Search for Keras with multiple inputs

人盡茶涼 提交于 2020-08-17 04:35:35
问题 I am trying to do a grid search over my hyperparameters for tuning a deep learning architecture. I have multiple input options to the model and I am trying to use sklearn's grid search api. The problem is, grid search api only takes single array as input and the code fails while it checks for the data size dimension.(My input dimension is 5*number of data points while according to sklearn api, it should be number of data points*feature dimension). My code looks something like this: from keras

Get individual models and customized score in GridSearchCV and RandomizedSearchCV [duplicate]

邮差的信 提交于 2020-07-20 04:33:46
问题 This question already has an answer here : Retrieving specific classifiers and data from GridSearchCV (1 answer) Closed 3 days ago . GridSearchCV and RandomizedSearchCV has best_estimator_ that : Returns only the best estimator/model Find the best estimator via one of the simple scoring methods : accuracy, recall, precision, etc. Evaluate based on training sets only I would like to enrich those limitations with My own definition of scoring methods Evaluate further on test set rather than

Python - LightGBM with GridSearchCV, is running forever

人走茶凉 提交于 2020-07-17 11:15:42
问题 Recently, I am doing multiple experiments to compare Python XgBoost and LightGBM. It seems that this LightGBM is a new algorithm that people say it works better than XGBoost in both speed and accuracy. This is LightGBM GitHub. This is LightGBM python API documents, here you will find python functions you can call. It can be directly called from LightGBM model and also can be called by LightGBM scikit-learn. This is the XGBoost Python API I use. As you can see, it has very similar data

Why we should call split() function during passing StratifiedKFold() as a parameter of GridSearchCV?

醉酒当歌 提交于 2020-06-16 05:55:26
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 12 days ago . What I am trying to do? I am trying to use StratifiedKFold() in GridSearchCV() . Then, what does confuse me? When we use K Fold Cross Validation, we just pass the number of CV inside GridSearchCV() like the following. grid_search_m = GridSearchCV(rdm_forest_clf, param_grid, cv=5, scoring='f1', return_train_score=True, n_jobs=2) Then, when I will need to use StratifiedKFold() ,