问题
Each observation in my data presents a player who follows some random pattern. Variables move1
up represent on which moves each player was active. I need to count the number of times each player was active:
The data look as follows (with _count
representing a variable that I would like to generate). The number of moves can also be different depending on simulation.
+------------+------------+-------+-------+-------+-------+-------+-------+--------+ | simulation | playerlist | move1 | move2 | move3 | move4 | move5 | move6 | _count | +------------+------------+-------+-------+-------+-------+-------+-------+--------+ | 1 | 1 | 1 | 1 | 1 | 2 | . | . | 3 | | 1 | 2 | 2 | 2 | 4 | 4 | . | . | 2 | | 2 | 3 | 1 | 2 | 3 | 3 | 3 | 3 | 4 | | 2 | 4 | 4 | 1 | 2 | 3 | 3 | 3 | 1 | +------------+------------+-------+-------+-------+-------+-------+-------+--------+
egen
combined with anycount()
is not applicable in this case because the argument for the value()
option is not a constant integer.
I have made an attempt to cycle through each observation and use egen
rowwise (see below) but it keeps count
as missing (as initialised) and is not very efficient (I have 50,000 observations). Is there a way to do this in Stata?
gen _count =.
quietly forval i = 1/`=_N' {
egen temp = anycount(move*), values( `=`playerlist'[`i']')
replace _count = temp
drop temp
}
回答1:
You can easily cut out the loop over observations. In addition, egen
is only to be used for convenience, never speed.
gen _count = 0
quietly forval j = 1/6 {
replace _count = _count + (move`j' == playerlist)
}
or
gen _count = move1 == playerlist
quietly forval j = 2/6 {
replace _count = _count + (move`j' == playerlist)
}
Even if you had been determined to use egen
, the loop need only be over the distinct values of playerlist
, not all the observations. Say the maximum is 42
gen _count = 0
quietly forval k = 1/42 {
egen temp = anycount(move*), value(`k')
replace _count = _count + temp
drop temp
}
But that's still a lousy method for your problem. (I wrote the original of anycount()
so I can say why it was written.)
See also http://www.stata-journal.com/sjpdf.html?articlenum=pr0046 for a review of working rowwise.
P.S. Your code contains bugs.
You replace
your count variable in all observations by the last value calculated for the count in the last observation.
Values are compared with a local macro playerlist
. You presumably have no local macro of that name, so the macro is evaluated as empty. The result is that you end by comparing each value of your move*
variables with the observation numbers. You meant to use the variable name playerlist
, but the single quotation marks force the macro interpretation.
For the record, this fixes both bugs:
gen _count = .
quietly forval i = 1/`=_N' {
egen temp = anycount(move*), values(`= playerlist[`i']')
replace _count = temp in `i'
drop temp
}
来源:https://stackoverflow.com/questions/18543912/stata-using-egen-anycount-when-values-vary-for-each-observation