问题
Let's consider a multivariate regression problem (2 response variables: Latitude and Longitude). Currently, a few machine learning model implementations like Support Vector Regression sklearn.svm.SVR
do not currently provide naive support of multivariate regression. For this reason, sklearn.multioutput.MultiOutputRegressor
can be used.
Example:
from sklearn.multioutput import MultiOutputRegressor
svr_multi = MultiOutputRegressor(SVR(),n_jobs=-1)
#Fit the algorithm on the data
svr_multi.fit(X_train, y_train)
y_pred= svr_multi.predict(X_test)
My goal is to tune the parameters of SVR
by sklearn.model_selection.GridSearchCV
. Ideally, if the response was a single variable and not multiple, I would perform an operation as follows:
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
pipe_svr = (Pipeline([('scl', StandardScaler()),
('reg', SVR())]))
grid_param_svr = {
'reg__C': [0.01,0.1,1,10],
'reg__epsilon': [0.1,0.2,0.3],
'degree': [2,3,4]
}
gs_svr = (GridSearchCV(estimator=pipe_svr,
param_grid=grid_param_svr,
cv=10,
scoring = 'neg_mean_squared_error',
n_jobs = -1))
gs_svr = gs_svr.fit(X_train,y_train)
However, as my response y_train
is 2-dimensional, I need to use the MultiOutputRegressor
on top of SVR. How can I modify the above code to enable this GridSearchCV operation? If not possible, is there a better alternative?
回答1:
I just found a working solution. In the case of nested estimators, the parameters of the inner estimator can be accessed by estimator__
.
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
pipe_svr = Pipeline([('scl', StandardScaler()),
('reg', MultiOutputRegressor(SVR()))])
grid_param_svr = {
'reg__estimator__C': [0.1,1,10]
}
gs_svr = (GridSearchCV(estimator=pipe_svr,
param_grid=grid_param_svr,
cv=2,
scoring = 'neg_mean_squared_error',
n_jobs = -1))
gs_svr = gs_svr.fit(X_train,y_train)
gs_svr.best_estimator_
Pipeline(steps=[('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
('reg', MultiOutputRegressor(estimator=SVR(C=10, cache_size=200,
coef0=0.0, degree=3, epsilon=0.1, gamma='auto', kernel='rbf', max_iter=-1,
shrinking=True, tol=0.001, verbose=False), n_jobs=1))])
回答2:
For use without pipeline, put estimator__
before parameters:
param_grid = {'estimator__min_samples_split':[10, 50],
'estimator__min_samples_leaf':[50, 150]}
gb = GradientBoostingRegressor()
gs = GridSearchCV(MultiOutputRegressor(gb), param_grid=param_grid)
gs.fit(X,y)
来源:https://stackoverflow.com/questions/43532811/gridsearch-over-multioutputregressor