How do you remove an insignificant factor level from a regression using the lm() function in R?

為{幸葍}努か 提交于 2019-12-04 13:35:35

If you only want to remove the non-significant levels from the output but include them for the estimation you just can use the coeftest function from AER package and then with properly indexig you'll get what you want.

 library(AER)
 coeftest(output)[-c(2,4), ]
                Estimate Std. Error    t value    Pr(>|t|)
(Intercept)    4.6180039  1.0397726  4.4413595 0.006756325
independent1c  5.5596699  2.0736190  2.6811434 0.043760158
independent2  -0.1335893  0.7880382 -0.1695214 0.872031752

If you don't feel like using AER package you can also do the following:

summary(output)$coefficients[-c(2,4),]
                Estimate Std. Error    t value    Pr(>|t|)
(Intercept)    4.6180039  1.0397726  4.4413595 0.006756325
independent1c  5.5596699  2.0736190  2.6811434 0.043760158
independent2  -0.1335893  0.7880382 -0.1695214 0.872031752

I prefer the last one since you don't need to install an additional package.

I don't know if this is what you're looking for.

If you're willing to take just the coefficent table and not the whole summary, you can just do this:

Extract the whole coefficient table:

ss <- coef(summary(output))

Take only the rows you want:

ss_sig <- ss[ss[,"Pr(>|t|)"]<0.05,]

printCoefmat pretty-prints coefficient tables with significance stars etc.

> printCoefmat(ss_sig)
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)     4.6180     1.0398  4.4414 0.006756 **
independent1c   5.5597     2.0736  2.6811 0.043760 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(This answer is similar to @Jilber's except that it automatically finds the non-significant rows for you rather than asking you to specify them manually.)

However, I have to agree with @Charlie's comment above that this is bad statistical practice ... dichotomizes the predictors artificially into significant/non-significant (predictors with p=0.049 and p=0.051 will be treated differently), and especially bad with categorical predictors where the particular set of parameters that are significant will depend on the contrasts/which level is use as the baseline ...

You can remove the levels of the factor variables using the option exclude:

lm(dependent ~ factor(independent1, exclude=c('b','d')) + independent2)

This way the factors b, d will not be included in the regression.

Cheers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!