问题
I have a huge database and I need to run different regressions with conditional statements. So I see to options to do it: 1) in the regression include the command data subset (industrycodes==12) and 2) I don't obtain the same results as if cut the data to the values when furniture==12. And they should be the same. Could somebody help me with the codes, I think I have a problem with this. I put an example very basic to explain it.
ID roa employees industrycodes
1 0,5 10 12
2 0,3 20 11
3 0,8 15 12
4 0,2 12 12
5 0,7 13 11
6 0,4 8 12
so first I create the subdatabase to compare (when the industry code is 12)
data2<-data1[data1$industrycodes==12,]
and here I run the regressions:
1) for the whole data taking only industrycodes==12 --> here I have the 6 observations
summary(lm(data1$roa~data1$employees, data=subset(data1,industrycodes==12)))
2) cutting the sample when the industrycode==12 --> here of course I have 4 observations
summary(lm(data2$roa~data2$employees),data=data2)
Any ideas of what can be wrong?? Thank you!
回答1:
The problem is that in the first you specify a dataset ( the one called subset(data1,industrycodes==12)) but then run the lm in another datset (data1 - the original one).
An extra comment is that since you use the command data=... in the lm you do not have to specify the variables with the $ , it works as an in the function lm attach command.
try this:
data3<- subset(data1,industrycodes==12)
summary(lm(roa~employees, data=data3) )
Hope it works
回答2:
Welcome to StackOverflow, I am having exactly the same results for both cases, the only thing I changes was to replace the commas ",
" by points ".
" to correctly indicate decimal places in roa
data1
ID roa employees industrycodes
1 1 0.5 10 12
2 2 0.3 20 11
3 3 0.8 15 12
4 4 0.2 12 12
5 5 0.7 13 11
6 6 0.4 8 12
summary(lm(data1$roa~data1$employees, data=subset(data1,industrycodes==12)))
summary(lm(data1$roa~data1$employees, data=data2))
First case results:
Call:
lm(formula = data1$roa ~ data1$employees, data = subset(data1,
industrycodes == 12))
Residuals:
1 2 3 4 5 6
0.01667 -0.18333 0.31667 -0.28333 0.21667 -0.08333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.833e-01 3.742e-01 1.292 0.266
data1$employees -5.918e-18 2.761e-02 0.000 1.000
Residual standard error: 0.259 on 4 degrees of freedom
Multiple R-squared: 8.039e-32, Adjusted R-squared: -0.25
F-statistic: 3.215e-31 on 1 and 4 DF, p-value: 1
data2 <- data1[data1$industrycodes==12,]
Second case results:
summary(lm(data1$roa~data1$employees, data=data2))
Call:
lm(formula = data1$roa ~ data1$employees, data = data2)
Residuals:
1 2 3 4 5 6
0.01667 -0.18333 0.31667 -0.28333 0.21667 -0.08333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.833e-01 3.742e-01 1.292 0.266
data1$employees -5.918e-18 2.761e-02 0.000 1.000
Residual standard error: 0.259 on 4 degrees of freedom
Multiple R-squared: 8.039e-32, Adjusted R-squared: -0.25
F-statistic: 3.215e-31 on 1 and 4 DF, p-value: 1
If you want to loop across all conditions you could add new columns. For example if you have two conditions:
data1$cond1 <- data1$industrycodes==12
data1$cond2 <- data1$industrycodes<=12
You can then use the loop:
for( i in 5:6) {
print(summary(lm(data1$roa~data1$employees, data=subset(data1,data1[,i]))))
}
来源:https://stackoverflow.com/questions/52856152/linear-regression-with-conditional-statement-in-r