Conditioning Stata dataset on past values of variables

偶尔善良 提交于 2020-01-25 10:58:08

问题


I have a problem in conditioning the dataset I have on Stata. Basically I want to condition the presence in the dataset -within a certain group- of an observation for which a certain action is performed (as indicated by a variable) on the past values of another variable. So let's suppose I have the following

obs | id | action1 | action2 | year 1 | 1 | 1 | 0 | 2000 2 | 1 | 0 | 1 | 2001 3 | 1 | 0 | 1 | 2002 4 | 1 | 0 | 1 | 2002 5 | 1 | 0 | 1 | 2003 6 | 2 | 1 | 0 | 2000 7 | 2 | 1 | 0 | 2001 8 | 2 | 0 | 1 | 2002 9 | 2 | 0 | 1 | 2002 10 | 2 | 0 | 1 | 2003

And for each group identified by 'id' I want to keep the observation only if action 1 is performed or if action1 has been performed no earlier than 2 years before action2 has been performed. In this simplified example only observation 4 should be deleted. Please note that the 2 actions are not mutually exclusive and they can be performed more than once within the same year therefore looking at 2 observations in the past does not necessarily means to look at 2 years in the past.

A solution which I am not able to implement by code would be: gen act1year= action1 * year then by(id) store the value of act1year when they're different from 0 somewhere (I am not able to implement this) and then by(id) keep if action1=1 or if action2[_n]=1 and the range year[_n] to year[_n]-2 contains at least one of the values in the previously stored variable.

I know probably my suggestion is not the easiest way to go and still I am not able to implement it, unfortunately I cannot manage to find a code that help me doing this. Hope you can help me. Thanks

Francesco


回答1:


The following assumes certain things.

clear
set more off

input ///
obs  id  action1  action2  year 
1  1  1  0  2000 
2  1  0  1  2001 
3  1  0  1  2002 
4  1  0  1  2003 
5  2  1  0  2000 
6  2  0  1  2001 
7  2  1  0  2002 
8  2  0  1  2003
end

list, sepby(id)

*-----

bysort id (year) : keep if action1 | (action1[_n-1] + action1[_n-2] > 0)

list, sepby(id)

What is between parenthesis evaluates to one or zero depending on whether the inequality is true or false, respectively. This fragment indicates if action 1 was taken in either of the previous two observations.

You need to decide what to do with the first two observations, as they can't be compared with exactly two previous observations (they don't exist). In the following example they are always kept, because comparing with a non-existant observation in this case implies adding missing values, which results in missing. A missing is considered a very large number in Stata.

You can also work with time-series operators (help tsvarlist, help xtset) and really respect the time variable. Here, I work with the previous two observations. That may or may not coincide with the previous two time points.

I think your two actions are mutually exclusive, but you are not explicit about it.



来源:https://stackoverflow.com/questions/31436776/conditioning-stata-dataset-on-past-values-of-variables

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!