How to retrieve data from multiple Stata files?

断了今生、忘了曾经 提交于 2019-11-29 17:21:16

What springs to mind here is the use of postfile.

Here is a simple example. First, I set up an example of several datasets. You already have this.

clear

forval i = 1/10 {
    set obs 100
    gen foo = `i' * runiform()
    save test`i'
    clear
}

Now I set up postfile. I need to set up a handle, what variables will be used, and what file will be used. Although I am using a numeric variable to hold file identifiers, it will perhaps be more typical to use a string variable. Also, looping over filenames may be a bit more challenging than this. fs from SSC is a convenience command that helps put a set of filenames into a local macro; its use is not illustrated here.

postfile mypost what mean using alltest.dta

forval i = 1/10 {
    use test`i', clear
    su foo, meanonly
    post mypost (`i')  (`r(mean)')
}

Now flush results

postclose mypost

and see what we have.

u alltest

list 

     +-----------------+
     | what       mean |
     |-----------------|
  1. |    1   .5110765 |
  2. |    2   1.016858 |
  3. |    3   1.425967 |
  4. |    4   2.144528 |
  5. |    5   2.438035 |
     |-----------------|
  6. |    6   3.030457 |
  7. |    7   3.356905 |
  8. |    8   4.449655 |
  9. |    9   4.381101 |
 10. |   10   5.017308 |
     +-----------------+

I didn't use any global macros (not global variables) here; you should not need to.

An alternative approach is to loop over files and use collapse to "condense" these files to the relevant means, and than append these condensed files. Here is an adaptation of Nick's example:

// create the example datasets
clear

forval i = 1/10 {
    set obs 100
    gen foo = `i' * runiform()
    gen year = `i'
    save test`i', replace
    clear
}

// use collapse and append
// to create the dataset you want
use test1, clear
collapse (mean) year foo 
save means, replace
forvalues i = 2/10 {
    use test`i', clear
    collapse (mean) year foo 
    append using means
    save means, replace
}

// admire the result
list

Note that if your data sets are not named sequentially like test1.dta, test2.dta, ..., test53.dta, but rather like results-alaska.dta, result_in_alabama.dta, ..., "wyoming data.dta" (note the space and hence the quotes), you would have to organize the cycle over these files somewhat differently:

local allfiles : dir . files "*.dta"
foreach f of local allfiles {
   use `"`f'"', clear
   * all other code from Maarten's or Nick's approach
}

This is a more advanced of local macros, see help extended macro functions. Note also that Stata will produce a list that will look like "results-alaska.dta" "result_in_alabama.dta" "wyoming data.dta" with quotes around file names, so when you invoke use, you will have to enclose the file name into compound quotes.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!