问题
Say that I have data like this:
group value
1 fox
1 fox
1 fox
2 dog
2 cat
3 frog
3 frog
4 dog
4 dog
I want to be able to tell if all values of value are the same within group. Another way to see this is if I could create a new variable that contains all unique values of value within group like the following:
group value all_values
1 fox fox
1 fox fox
1 fox fox
2 dog dog cat
2 cat dog cat
3 frog frog
3 frog frog
4 dog dog
4 dog dog
As we see, all groups except group 2 have only one distinct entry for value.
One way I thought that a similar thing (but not as good) could be done is to do the following:
bys group: egen tag = tag(value)
bys group: egen sum = sum(tag)
And then based on the value of sum I could determine if there were more than one entry.
However, egen tag does not work with bysort. Is there any other efficient way to get the information I need?
回答1:
There are several ways to do this. One is:
clear
set more off
input ///
group str5 value
1 fox
1 fox
1 fox
2 dog
2 cat
3 frog
3 frog
4 dog
4 dog
end
*-----
bysort group (value) : gen onevalue = value[1] == value[_N]
list, sepby(group)
Suppose you have missings, but want to ignore them (not drop them); then the following works:
clear
set more off
input ///
group str5 value
1 fox
1 fox
1 fox
2 dog
2 cat
3 frog
3 frog
4 dog
4 dog
5 ox
5 ox
5
6 cow
6 goat
6
end
*-----
encode value, gen(value2)
bysort group (value2) : replace value2 = value2[_n-1] if missing(value2)
by group: gen onevalue = value2[1] == value2[_N]
list, sepby(group)
See also this FAQ, which has technique that resembles your original strategy.
来源:https://stackoverflow.com/questions/29702566/how-to-see-if-all-values-within-group-are-unique-identify-those-that-arent