grid-search | 易学教程

Make grid search functions in sklearn to ignore empty models

阅读更多关于 Make grid search functions in sklearn to ignore empty models

问题 Using python and scikit-learn, I'd like to do a grid search. But some of my models end up being empty. How can I make the grid search function to ignore those models? I guess I can have a scoring function which returns 0 if the models is empty, but I'm not sure how. predictor = sklearn.svm.LinearSVC(penalty='l1', dual=False, class_weight='auto') param_dist = {'C': pow(2.0, np.arange(-10, 11))} learner = sklearn.grid_search.GridSearchCV(estimator=predictor, param_grid=param_dist, n_jobs=self.n

Avoid certain parameter combinations in GridSearchCV

阅读更多关于 Avoid certain parameter combinations in GridSearchCV

问题 I'm using scikit-learn's GridSearchCV to iterate over a parameter space to tune a model. Specifically, I'm using it to test different hyperparameters in a neural network. The grid is as follows: params = {'num_hidden_layers': [0,1,2], 'hidden_layer_size': [64,128,256], 'activation': ['sigmoid', 'relu', 'tanh']} The problem is that I end up running redundant models when hidden num_hidden_layers is set to 0 . It will run a model with 0 hidden layers and 64 units, another with 128 units, and

pyspark: getting the best model's parameters after a gridsearch is blank {}

阅读更多关于 pyspark: getting the best model's parameters after a gridsearch is blank {}

问题 could someone help me extract the best performing model's parameters from my grid search? It's a blank dictionary for some reason. from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit, CrossValidator from pyspark.ml.evaluation import BinaryClassificationEvaluator train, test = df.randomSplit([0.66, 0.34], seed=12345) paramGrid = (ParamGridBuilder() .addGrid(lr.regParam, [0.01,0.1]) .addGrid(lr.elasticNetParam, [1.0,]) .addGrid(lr.maxIter, [3,]) .build()) evaluator =

Scikit - Combining scale and grid search

阅读更多关于 Scikit - Combining scale and grid search

问题 I am new to scikit, and have 2 slight issues to combine a data scale and grid search. Efficient scaler Considering a cross validation using Kfolds, I would like that each time we train the model on the K-1 folds, the data scaler (using preprocessing.StandardScaler() for instance) is fit only on the K-1 folds and then apply to the remaining fold. My impression is that the following code, will fit the scaler on the entire dataset, and therefore I would like to modify it to behave as described

Parallel error with GridSearchCV, works fine with other methods

阅读更多关于 Parallel error with GridSearchCV, works fine with other methods

问题 I am encounteringt the following problems using GridSearchCV: it gives me a parallel error while using n_jobs > 1 . At the same time n_jobs > 1 works fine with the single models like RadonmForestClassifier. Below is a simple working example showing the errors: train = np.random.rand(100,10) targ = np.random.randint(0,2,100) clf = ensemble.RandomForestClassifier(n_jobs = 2) clf.fit(train,targ) train = np.random.rand(100,10) targ = np.random.randint(0,2,100) clf = ensemble

Use a metric after a classifier in a Pipeline

阅读更多关于 Use a metric after a classifier in a Pipeline

问题 I continue to investigate about pipeline. My aim is to execute each step of machine learning only with pipeline. It will be more flexible and easier to adapt my pipeline with an other use case. So what I do: Step 1: Fill NaN Values Step 2: Transforming Categorical Values into Numbers Step 3: Classifier Step 4: GridSearch Step 5: Add a metrics (failed) Here is my code: import pandas as pd from sklearn.base import BaseEstimator, TransformerMixin from sklearn.feature_selection import SelectKBest

Scikit learn GridSearchCV AUC performance

阅读更多关于 Scikit learn GridSearchCV AUC performance

问题 I'm using GridSearchCV to identify the best set of parameters for a random forest classifier. PARAMS = { 'max_depth': [8,None], 'n_estimators': [500,1000] } rf = RandomForestClassifier() clf = grid_search.GridSearchCV(estimator=rf, param_grid=PARAMS, scoring='roc_auc', cv=5, n_jobs=4) clf.fit(data, labels) where data and labels are respectively the full dataset and the corresponding labels. Now, I compared the performance returned by the GridSearchCV (from clf.grid_scores_ ) with a "manual"

Alternate different models in Pipeline for GridSearchCV

阅读更多关于 Alternate different models in Pipeline for GridSearchCV

问题 I want to build a Pipeline in sklearn and test different models using GridSearchCV. Just an example (please do not pay attention on what particular models are chosen): reg = LogisticRegression() proj1 = PCA(n_components=2) proj2 = MDS() proj3 = TSNE() pipe = [('proj', proj1), ('reg' , reg)] pipe = Pipeline(pipe) param_grid = { 'reg__c': [0.01, 0.1, 1], } clf = GridSearchCV(pipe, param_grid = param_grid) Here if I want to try different models for dimensionality reduction, I need to code

Pipeline and GridSearch for Doc2Vec

阅读更多关于 Pipeline and GridSearch for Doc2Vec

问题 I currently have following script that helps to find the best model for a doc2vec model. It works like this: First train a few models based on given parameters and then test against a classifier. Finally, it outputs the best model and classifier (I hope). Data Example data (data.csv) can be downloaded here: https://pastebin.com/takYp6T8 Note that the data has a structure that should make an ideal classifier with 1.0 accuracy. Script import sys import os from time import time from operator

specify scoring metric in GridSearch function with hypopt package in python

阅读更多关于 specify scoring metric in GridSearch function with hypopt package in python

I'm using Gridsearch function from hypopt package to do my hyperparameter searching using specified validation set. The default metric for classification seems to be accuracy (not very sure). Here I want to use F1 score as the metric. I do not know where I should specify the metric. I looked at the documentation but kind of confused. Does anyone who are familiar with hypopt package know how I can do this? Thanks a lot in advance. from hypopt import GridSearch log_reg_params = {"penalty": ['l1'], 'C': [0.001, 0.01]} opt = GridSearch(model=LogisticRegression()) opt.fit(X_train, y_train, log_reg