R: lm() result differs when using `weights` argument and when using manually reweighted data

前端未结

关注

 1  1542

In order to correct heteroskedasticity in error terms, I am running the following weighted least squares regression in R :

#Call:
#lm(formula = a ~ q + q2 +


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2020-11-30 07:00
              
            
            
                                                                       
Provided you do manual weighting correctly, you won't see discrepancy.



So the correct way to go is:

X <- model.matrix(~ q + q2 + b + c, mydata)  ## non-weighted model matrix (with intercept)
w <- mydata$weighting  ## weights
rw <- sqrt(w)    ## root weights
y <- mydata$a    ## non-weighted response
X_tilde <- rw * X    ## weighted model matrix (with intercept)
y_tilde <- rw * y    ## weighted response

## remember to drop intercept when using formula
fit_by_wls <- lm(y ~ X - 1, weights = w)
fit_by_ols <- lm(y_tilde ~ X_tilde - 1)


Although it is generally recommended to use lm.fit and lm.wfit when passing in matrix directly:

matfit_by_wls <- lm.wfit(X, y, w)
matfit_by_ols <- lm.fit(X_tilde, y_tilde)


But when using these internal subroutines lm.fit and lm.wfit, it is required that all input are complete cases without NA, otherwise the underlying C routine stats:::C_Cdqrls will complain.

If you still want to use the formula interface rather than matrix, you can do the following:

## weight by square root of weights, not weights
mydata$root.weighting <- sqrt(mydata$weighting)
mydata$a.wls <- mydata$a * mydata$root.weighting
mydata$q.wls <- mydata$q * mydata$root.weighting
mydata$q2.wls <- mydata$q2 * mydata$root.weighting
mydata$b.wls <- mydata$b * mydata$root.weighting
mydata$c.wls <- mydata$c * mydata$root.weighting

fit_by_wls <- lm(formula = a ~ q + q2 + b + c, data = mydata, weights = weighting)

fit_by_ols <- lm(formula = a.wls ~ 0 + root.weighting + q.wls + q2.wls + b.wls + c.wls,
                 data = mydata)




Reproducible Example

Let's use R's built-in data set trees. Use head(trees) to inspect this dataset. There is no NA in this dataset. We aim to fit a model:

Height ~ Girth + Volume


with some random weights between 1 and 2:

set.seed(0); w <- runif(nrow(trees), 1, 2)


We fit this model via weighted regression, either by passing weights to lm, or manually transforming data and calling lm with no weigths:

X <- model.matrix(~ Girth + Volume, trees)  ## non-weighted model matrix (with intercept)
rw <- sqrt(w)    ## root weights
y <- trees$Height    ## non-weighted response
X_tilde <- rw * X    ## weighted model matrix (with intercept)
y_tilde <- rw * y    ## weighted response

fit_by_wls <- lm(y ~ X - 1, weights = w)
#Call:
#lm(formula = y ~ X - 1, weights = w)

#Coefficients:
#X(Intercept)        XGirth       XVolume  
#     83.2127       -1.8639        0.5843

fit_by_ols <- lm(y_tilde ~ X_tilde - 1)
#Call:
#lm(formula = y_tilde ~ X_tilde - 1)

#Coefficients:
#X_tilde(Intercept)        X_tildeGirth       X_tildeVolume  
#           83.2127             -1.8639              0.5843


So indeed, we see identical results.

Alternatively, we can use lm.fit and lm.wfit:

matfit_by_wls <- lm.wfit(X, y, w)
matfit_by_ols <- lm.fit(X_tilde, y_tilde)


We can check coefficients by:

matfit_by_wls$coefficients
#(Intercept)       Girth      Volume 
# 83.2127455  -1.8639351   0.5843191 

matfit_by_ols$coefficients
#(Intercept)       Girth      Volume 
# 83.2127455  -1.8639351   0.5843191


Again, results are the same.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复