How to see if all values within group are unique/identify those that aren't

问题

Say that I have data like this:

group value
1     fox
1     fox
1     fox
2     dog
2     cat
3     frog
3     frog
4     dog
4     dog

I want to be able to tell if all values of value are the same within group. Another way to see this is if I could create a new variable that contains all unique values of value within group like the following:

group value all_values
1     fox    fox
1     fox    fox
1     fox    fox
2     dog    dog cat
2     cat    dog cat
3     frog   frog
3     frog   frog
4     dog    dog
4     dog    dog

As we see, all groups except group 2 have only one distinct entry for value.

One way I thought that a similar thing (but not as good) could be done is to do the following:

bys group: egen tag = tag(value)
bys group: egen sum = sum(tag)

And then based on the value of sum I could determine if there were more than one entry.

However, egen tag does not work with bysort. Is there any other efficient way to get the information I need?

回答1:

There are several ways to do this. One is:

clear
set more off

input ///
group str5 value
1     fox
1     fox
1     fox
2     dog
2     cat
3     frog
3     frog
4     dog
4     dog
end

*-----

bysort group (value) : gen onevalue = value[1] == value[_N]

list, sepby(group)

Suppose you have missings, but want to ignore them (not drop them); then the following works:

clear
set more off

input ///
group str5 value
1     fox
1     fox
1     fox
2     dog
2     cat
3     frog
3     frog
4     dog
4     dog
5     ox
5     ox
5     
6     cow
6     goat
6      
end

*-----

encode value, gen(value2)

bysort group (value2) : replace value2 = value2[_n-1] if missing(value2)
by group: gen onevalue = value2[1] == value2[_N]

list, sepby(group)

See also this FAQ, which has technique that resembles your original strategy.

来源：https://stackoverflow.com/questions/29702566/how-to-see-if-all-values-within-group-are-unique-identify-those-that-arent

标签

unique

stata