How to solve several independent time series at the same time using scikit linear regression model

前端 未结 2 932
日久生厌
日久生厌 2021-01-03 07:14

I try to predict multiple independent time series simultaneously using sklearn linear regression model, but I seem not be able to get it right.

My data is organised

2条回答
  •  萌比男神i
    2021-01-03 07:31

    @ali_m I don't think this is a duplicate question, but they are partly related. And of course it's possible to apply and predict time series simultaneously using a linear regression model similar to sklearn:

    I created a new class LinearRegression_Multi:

    class LinearRegression_Multi:
        def stacked_lstsq(self, L, b, rcond=1e-10):
            """
            Solve L x = b, via SVD least squares cutting of small singular values
            L is an array of shape (..., M, N) and b of shape (..., M).
            Returns x of shape (..., N)
            """
            u, s, v = np.linalg.svd(L, full_matrices=False)
            s_max = s.max(axis=-1, keepdims=True)
            s_min = rcond*s_max
            inv_s = np.zeros_like(s)
            inv_s[s >= s_min] = 1/s[s>=s_min]
            x = np.einsum('...ji,...j->...i', v,
                          inv_s * np.einsum('...ji,...j->...i', u, b.conj()))
            return np.conj(x, x)    
    
        def center_data(self, X, y):
            """ Centers data to have mean zero along axis 0. 
            """
            # center X        
            X_mean = np.average(X,axis=1)
            X_std = np.ones(X.shape[0::2])
            X = X - X_mean[:,None,:] 
            # center y
            y_mean = np.average(y,axis=1)
            y = y - y_mean[:,None]
            return X, y, X_mean, y_mean, X_std
    
        def set_intercept(self, X_mean, y_mean, X_std):
            """ Calculate the intercept_
            """
            self.coef_ = self.coef_ / X_std # not really necessary
            self.intercept_ = y_mean - np.einsum('ij,ij->i',X_mean,self.coef_)
    
        def scores(self, y_pred, y_true ):
            """ 
            The coefficient R^2 is defined as (1 - u/v), where u is the regression
            sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual
            sum of squares ((y_true - y_true.mean()) ** 2).sum().        
            """        
            u = ((y_true - y_pred) ** 2).sum(axis=-1)
            v = ((y_true - y_true.mean(axis=-1)[None].T) ** 2).sum(axis=-1)
            r_2 = 1 - u/v
            return r_2
    
        def fit(self,X, y):
            """ Fit linear model.        
            """        
            # get coefficients by applying linear regression on stack
            X_, y, X_mean, y_mean, X_std = self.center_data(X, y)
            self.coef_ = self.stacked_lstsq(X_, y)
            self.set_intercept(X_mean, y_mean, X_std)
    
        def predict(self, X):
            """Predict using the linear model
            """
            return np.einsum('ijx,ix->ij',X,self.coef_) + self.intercept_[None].T
    

    Which can be applied as follow, using the same declared variables as in the question:

    LR_Multi = LinearRegression_Multi()
    LR_Multi.fit(X_stack[:,:half], y_stack[:,:half])
    y_stack_pred = LR_Multi.predict(X_stack[:,half:])
    R2 = LR_Multi.scores(y_stack_pred, y_stack[:,half:])
    

    Where the R^2 for the multiple time series are:

    array([ 0.91262442,  0.67247516])
    

    Which is indeed similar to the prediction method of the standard sklearn linear regression:

    from sklearn.linear_model import LinearRegression
    
    LR = LinearRegression()
    LR.fit(X1[:half], y1[:half])
    R2_1 = LR.score(X1[half:],y1[half:])
    
    LR.fit(X2[:half], y2[:half])
    R2_2 = LR.score(X2[half:],y2[half:])
    print R2_1, R2_2
    0.912624422097 0.67247516054
    

提交回复
热议问题