How can I make the FunctionTransformer along with GridSearchCV into a pipeline?

£可爱£侵袭症+ 提交于 2021-01-29 05:56:35

问题


Basically, I want to treat the column index as a hyperparameter. Then tune this hyperparameter along with other model hyperparameters in the pipeline. In my example below, the col_idx is my hyperparameter. I self-defined a function called log_columns that can perform log transformation on certain columns and the function can be passed into FunctionTransformer. Then put FunctionTransformer and model into the pipeline.

from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import FunctionTransformer


def log_columns(X, col_idx = None):
    log_func = np.vectorize(np.log)
    if col_idx is None:
        return X
    for idx in col_idx:
        X[:,idx] = log_func(X[:,idx])
    return X

pipe = make_pipeline(FunctionTransformer(log_columns, ), PCA(), SVC())
param_grid = dict(functiontransformer__col_idx = [None, [1]],
              pca__n_components=[2, 5, 10],
              svc__C=[0.1, 10, 100],
              )

grid_search = GridSearchCV(pipe, param_grid=param_grid)
digits = load_digits()

res = grid_search.fit(digits.data, digits.target)

Then, I received the following error message:

ValueError: Invalid parameter col_idx for estimator 
FunctionTransformer(accept_sparse=False, check_inverse=True,
      func=<function log_columns at 0x1764998c8>, inv_kw_args=None,
      inverse_func=None, kw_args=None, pass_y='deprecated',
      validate=None). Check the list of available parameters with 
`estimator.get_params().keys()`.

I am not sure if FunctionTransformer allows me to do what I expected. If not, I am eager to know other elegant methods. Thanks!


回答1:


col_idx is not a valid parameter for FunctionTransformer class, but kw_args is. kw_args is a dictionary of additional keyword arguments of func. In your case, the only keyword argument is col_idx.

Try this:

param_grid = dict(
    functiontransformer__kw_args=[
        {'col_idx': None},
        {'col_idx': [1]}
    ],
    pca__n_components=[2, 5, 10],
    svc__C=[0.1, 10, 100],
)



回答2:


First of all, you should check params, that you can adjust: pipe.get_params().keys().

After, please, have a look into the documentation on how to organize param_grid.



来源:https://stackoverflow.com/questions/57584012/how-can-i-make-the-functiontransformer-along-with-gridsearchcv-into-a-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!