问题
Basically, I want to treat the column index as a hyperparameter. Then tune this hyperparameter along with other model hyperparameters in the pipeline. In my example below, the col_idx
is my hyperparameter. I self-defined a function called log_columns
that can perform log transformation on certain columns and the function can be passed into FunctionTransformer
. Then put FunctionTransformer and model into the pipeline.
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import FunctionTransformer
def log_columns(X, col_idx = None):
log_func = np.vectorize(np.log)
if col_idx is None:
return X
for idx in col_idx:
X[:,idx] = log_func(X[:,idx])
return X
pipe = make_pipeline(FunctionTransformer(log_columns, ), PCA(), SVC())
param_grid = dict(functiontransformer__col_idx = [None, [1]],
pca__n_components=[2, 5, 10],
svc__C=[0.1, 10, 100],
)
grid_search = GridSearchCV(pipe, param_grid=param_grid)
digits = load_digits()
res = grid_search.fit(digits.data, digits.target)
Then, I received the following error message:
ValueError: Invalid parameter col_idx for estimator
FunctionTransformer(accept_sparse=False, check_inverse=True,
func=<function log_columns at 0x1764998c8>, inv_kw_args=None,
inverse_func=None, kw_args=None, pass_y='deprecated',
validate=None). Check the list of available parameters with
`estimator.get_params().keys()`.
I am not sure if FunctionTransformer
allows me to do what I expected. If not, I am eager to know other elegant methods. Thanks!
回答1:
col_idx
is not a valid parameter for FunctionTransformer
class, but kw_args
is.
kw_args
is a dictionary of additional keyword arguments of func
. In your case,
the only keyword argument is col_idx
.
Try this:
param_grid = dict(
functiontransformer__kw_args=[
{'col_idx': None},
{'col_idx': [1]}
],
pca__n_components=[2, 5, 10],
svc__C=[0.1, 10, 100],
)
回答2:
First of all, you should check params, that you can adjust: pipe.get_params().keys()
.
After, please, have a look into the documentation on how to organize param_grid
.
来源:https://stackoverflow.com/questions/57584012/how-can-i-make-the-functiontransformer-along-with-gridsearchcv-into-a-pipeline