How to instantiate a Scikit-Learn linear model with known coefficients without fitting it

问题

Background

I am testing various saved models as part of an experiment, but one of the models comes from an algorithm I wrote, not from a sklearn model-fitting.

However, my custom model is still a linear model so I want to instantiate a LinearModel instance and set the coef_ and intercept_ attributes to the values from my custom fitting algorithm so I can use it for predictions.

What I tried so far:

from sklearn.linear_model import LinearRegression

my_intercepts = np.ones(2)
my_coefficients = np.random.randn(2, 3)

new_model = LinearRegression()
new_model.intercept_ = my_intercepts
new_model.coef_ = my_coefficients

It seems to work okay for prediction:

X_test = np.random.randn(5, 3)

new_model.predict(X_test)

It passes this test:

from sklearn.utils.validation import check_is_fitted

check_is_fitted(new_model)

Question

Is this method fine? It feels like a hack and I suspect there is a 'proper' way to do this.

回答1:

Although the simple technique in the question works, the danger is that you might later call the object's fit method and over-write your coefficients.

A slightly more 'proper' way to do this, if the model is only going to be used for prediction, would be to inherit from sklearn's class and overload the fit method as follows:

class LinearPredictionModel(LinearRegression):
    """
    This model is for prediction only.  It has no fit method.
    You can initialize it with fixed values for coefficients 
    and intercepts.  

    Parameters
    ----------
    coef, intercept : arrays
        See attribute descriptions below.

    Attributes
    ----------
    coef_ : array of shape (n_features, ) or (n_targets, n_features)
        Coefficients of the linear model.  If there are multiple targets
        (y 2D), this is a 2D array of shape (n_targets, n_features), 
        whereas if there is only one target, this is a 1D array of 
        length n_features.
    intercept_ : float or array of shape of (n_targets,)
        Independent term in the linear model.
    """

    def __init__(self, coef=None, intercept=None):
        if coef is not None:
            coef = np.array(coef)
            if intercept is None:
                intercept = np.zeros(coef.shape[0])
            else:
                intercept = np.array(intercept)
            assert coef.shape[0] == intercept.shape[0]
        else:
            if intercept is not None:
                raise ValueError("Provide coef only or both coef and intercept")
        self.intercept_ = intercept
        self.coef_ = coef

    def fit(self, X, y):
        """This model does not have a fit method."""
        raise NotImplementedError("model is only for prediction")

Then, instantiate the model as follows:

new_model = LinearPredictionModel(coef=my_coefficients, intercept=my_intercepts)

I think the only 'proper' way to do this would be for me to fully implement a new class with my custom algorithm in the fit method. But for the simple needs of testing the coefficients in a scikit-learn environment, this method seems to work fine.

回答2:

This approach works nicely for primitive methods (such as linear-regression), but how can you tweak this for more complex models (such as lasso or elastic net or...). It appears the linear regressor can be modified like this, but a lasso regressor still throws errors (complaint of not being fit: As in this question, which is indicated as a duplicate of the above.

来源：https://stackoverflow.com/questions/61748510/how-to-create-a-scikit-learn-model-using-user-selected-model-parameters

标签

python

scikit-learn

initialization

linear-regression