问题
I want to get something like this.
A
1
1
2
3
3
4
4
4
4
I want to make it to be
A B
1 2
1 2
2 1
3 2
3 2
4 4
4 4
4 4
4 4
Like you see here, the keys are duplicated and still in the same order as original.
I know how to do this task in R by using data.table and I only know how to use groupby to get unique key counts in pandas.
Anyone have ideas?
Thank you!
回答1:
You can use this:
import pandas as pd
df = pd.DataFrame({
'A' : [1, 1, 2, 3, 3, 4, 4, 4, 4]
})
df['B'] = df.groupby(['A'])['A'].transform('count')
print(df)
output:
A B
0 1 2
1 1 2
2 2 1
3 3 2
4 3 2
5 4 4
6 4 4
7 4 4
8 4 4
回答2:
You could use a groupby and merge:
df = pd.DataFrame({'A' : [1, 1, 2, 3, 3, 4, 4, 4, 4]})
df = df.merge(df.groupby('A').size().reset_index(), on='A')
Which will give you:
A 0
0 1 2
1 1 2
2 2 1
3 3 2
4 3 2
5 4 4
6 4 4
7 4 4
8 4 4
回答3:
Fast way using pd.factorize
and np.bincount
f = df.A.factorize()[0]
df.assign(B=np.bincount(f)[f])
A B
0 1 2
1 1 2
2 2 1
3 3 2
4 3 2
5 4 4
6 4 4
7 4 4
8 4 4
Explanation
pd.factorize
will create an array of integers where each integer represents a unique value in the factorized array. These integers start from zero.
f
array([0, 0, 1, 2, 2, 3, 3, 3, 3])
np.bincount
will use each value in an array of integers and count how many times that integer has been seen. If we think of these integers as bins, then we are counting how many times each bin is referenced.
np.bincount(f)
array([2, 1, 2, 4])
Finally, we use f
to slice these counts to give us back the counts repeated for each time the bin was referenced.
np.bincount(f)[f]
array([2, 2, 1, 2, 2, 4, 4, 4, 4])
回答4:
Using map
with groupby
size
df['B']=df.A.map(df.groupby('A').size())
df
Out[630]:
A B
0 1 2
1 1 2
2 2 1
3 3 2
4 3 2
5 4 4
6 4 4
7 4 4
8 4 4
来源:https://stackoverflow.com/questions/49889312/pandas-count-elements-in-a-columns-and-show-in-duplicated-way