问题
I have already asked a question about storing coefficients and standard errors of several regressions in a single dataset.
Let me just reiterate the objective of my initial question:
I would like to run several regressions and store their results in a DTA file that I could later use for analysis. My constraints are:
- I cannot install modules (I am writing code for other people and not sure what modules they have installed)
- Some of the regressors are factor variables.
- Each regression differ only by the dependent variable, so I would like to store that in the final dataset to keep track of what regression the coefficients/variances correspond to.
The solution suggest by Roberto Ferrer was working well on my test data, but turns out not to work so well on some other type of data. The reason is that my sample changes slightly from one regression to the next, and some factor variable does not take the same number of values in each regressions. This results in the fixed effects (created on the fly using i.myvar
as a regressor) not having the same cardinality.
Let's say that I decide to put year fixed effects (as in: year-specific intercepts) using i.year
but in one regression there is no observation for the year 2006. That means that this particular regression will have one fewer regressor (the dummy corresponding to year==2006 does not get created), and as a result a smaller matrix that stores the coeffs.
This results in a conformability error when trying to stack the matrices together.
I was wondering if there was a way to make the initial solution robust to varying number of regressors. (Perhaps saving each regressions as dta, then merging?)
I am still subject to the constraint that I cannot rely on external packages.
回答1:
You can follow the strategy of append
ing datasets, making small changes to the code in the question you reference:
clear
set more off
save test.dta, emptyok replace
foreach depvar in marriage divorce {
// test data
sysuse census, clear
generate constant = 1
replace marriage = . if region == 4
// regression
reg `depvar' popurban i.region constant, robust noconstant // regressions
matrix result_matrix = e(b)\vecdiag(e(V)) // grab coeffs and their variances in a 2xK matrix
matrix rownames result_matrix = `depvar'_b `depvar'_v // add rownames to the two extra rows
// get original column names of matrix
local names : colfullnames result_matrix
// get original row names of matrix (and row count)
local rownames : rowfullnames result_matrix
local c : word count `rownames'
// make original names legal variable names
local newnames
foreach name of local names {
local newnames `newnames' `=strtoname("`name'")'
}
// rename columns of matrix
matrix colnames result_matrix = `newnames'
// from matrix to dataset
clear
svmat result_matrix, names(col)
// add matrix row names to dataset
gen rownames = ""
forvalues i = 1/`c' {
replace rownames = "`:word `i' of `rownames''" in `i'
}
// append
append using "test.dta"
save "test.dta", replace
}
// list
order rownames
list, noobs
The result is what you want. However, the problem is that the dataset is re-loaded every time around the loop; it loads data as many times as regressions you estimate.
You may want to take a look at post
and check if you can manage a more efficient solution. statsby
could also work, but you need to find a smart way of renaming the stored variables.
来源:https://stackoverflow.com/questions/32226194/stata-combining-coefficients-standard-errors-from-several-regressions-in-a-sing