问题
I am having a little problem with producing statistics for my dataframe in pandas. My dataframe looks like this (I omit the index):
id type
1 A
2 B
3 A
1 B
3 B
2 C
4 B
4 C
What is important, each id has two type values assigned, as can be seen from the example above. I want to count all type combinations occurrences (so count number of unique id with given type combination), so I want to get such a dataframe:
type count
A, B 2
A, C 0
B, C 2
I tried using groupby in many ways, but in vain. I can do this kind of 'count' using for-loop and a number of lines of code, but I believe there has to be elegant and proper (in python terms) solution to this problem.
Thanks in advance for any hints.
回答1:
Using GroupBy + apply with value_counts:
from itertools import combinations
def combs(types):
return pd.Series(list(combinations(sorted(types), 2)))
res = df.groupby('id')['type'].apply(combs).value_counts()
print(res)
(A, B) 2
(B, C) 2
Name: type, dtype: int64
回答2:
pd.value_counts and itertools.combinations
from itertools import combinations
pd.value_counts(
[(x, y) for _, d in df.groupby('id') for x, y in combinations(d.type, 2)]
)
(A, B) 2
(B, C) 2
dtype: int64
回答3:
Using Counter, groupby and the default constructor
from collections import Counter
>>> pd.DataFrame(Counter([tuple(v.type.values) for _,v in df.groupby('id')]), index=['Count']).T
Count
A B 2
B C 2
回答4:
Maybe using unique, notice only good for two unique value within one id
df.groupby('id').type.unique().apply(tuple).value_counts()
Out[202]:
(A, B) 2
(B, C) 2
Name: type, dtype: int64
来源:https://stackoverflow.com/questions/53159144/number-of-unique-pairs-within-one-column-pandas