Stata: calculating growth rates for observations with same ID

荒凉一梦 提交于 2019-12-11 11:47:54

问题


I want to calculate growth rates in Stata for observations having the same ID. My data looks like this in a simplified way:

ID    year   a   b   c   d   e   f
10    2010   2   4   9   8   4   2
10    2011   3   5   4   6   5   4
220   2010   1   6   11  14  2   5
220   2011   6   2   12  10  5   4    
334   2010   4   5   4   6   1   4
334   2011   5   5   4   4   3   2

Now I want to calculate for each ID growth rates from variables a-f from 2010 to 2011:

For e.g ID 10 and variable a it would be: (3-2)/2, for variable b: (5-4)/4 etc. and store the results in new variables (e.g. growth_a, growth_b etc).

Since I have over 120k observations and around 300 variables, is there an efficient way to do so (loop)?

My code looks like the following (simplified):

local variables "a b c d e f"
foreach x in local variables { 
bys ID: g `x'_gr = (`x'[_n]-`x'[_n-1])/`x'[_n-1]
}

FYI: variables a-f are numeric.

But Stata says: 'local not found' and I am not sure whether the code is correct. Do I also have to sort for year first?


回答1:


The specific error in

local variables "a b c d e f"
foreach x in local variables { 
    bys ID: g `x'_gr = (`x'[_n]-`x'[_n-1])/`x'[_n-1]
}

is an error in the syntax of foreach, which here expects syntax like foreach x of local variables, given your prior use of a local macro. With the keyword in, foreach takes the word local literally and here looks for a variable with that name: hence the error message. This is basic foreach syntax: see its help.

This code is problematic for further reasons.

  1. Sorting on ID does not guarantee the correct sort order, here time order by year, for each distinct ID. If observations are jumbled within ID, results will be garbage.

  2. The code assumes that all time values are present; otherwise the time gap between observations might be unequal.

A cleaner way to get growth rates is

tsset ID year 
foreach x in a b c d e f { 
    gen `x'_gr = D.`x'/L.`x' 
} 

Once you have tsset (or xtset) the time series operators can be used without fear: correct sorting is automatic and the operators are smart about gaps in the data (e.g. jumps from 1982 to 1984 in yearly data).

For more variables the loop could be

foreach x of var <whatever> { 
    gen `x'_gr = D.`x'/L.`x' 
} 

where <whatever> could be a general (numeric) varlist.

EDIT: The question has changed since first posting and interest is declared in calculating growth rates only from 2010 to 2011, with the implication in the example that only those years are present. The more general code above will naturally still work for calculating those growth rates.



来源:https://stackoverflow.com/questions/32199175/stata-calculating-growth-rates-for-observations-with-same-id

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!