Assign Unique Numeric Group IDs to Groups in Pandas [duplicate]

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-20 03:52:07

问题


I've consistently run into this issue of having to assign a unique ID to each group in a data set. I've used this when zero padding for RNN's, generating graphs, and many other occasions.

This can usually be done by concatenating the values in each pd.groupby column. However, it is often the case the number of columns that define a group, their dtype, or the value sizes make concatenation an impractical solution that needlessly uses up memory.

I was wondering if there was an easy way to assign a unique numeric ID to groups in pandas.


回答1:


You just need ngroup data from seeiespi (or pd.factorize)

df.groupby('C').ngroup()
Out[322]: 
0    0
1    0
2    2
3    1
4    1
5    1
6    1
7    2
8    2
dtype: int64

More Option

pd.factorize(df.C)[0]
Out[323]: array([0, 0, 1, 2, 2, 2, 2, 1, 1], dtype=int64)
df.C.astype('category').cat.codes
Out[324]: 
0    0
1    0
2    2
3    1
4    1
5    1
6    1
7    2
8    2
dtype: int8



回答2:


I managed a simple solution that I constantly reference and wanted to share:

df = pd.DataFrame({'A':[1,2,3,4,6,3,7,3,2],'B':[4,3,8,2,6,3,9,1,0], 'C':['a','a','c','b','b','b','b','c','c']})

df = df.sort_values('C')

df['gid'] = (df.groupby(['C']).cumcount()==0).astype(int)

df['gid'] = df['gid'].cumsum()

In [17]: df
Out[17]:
   A  B  C  gid
0  1  4  a    1
1  2  3  a    1
2  3  8  b    2
3  4  2  b    2
4  6  6  b    2
5  3  3  b    2
6  7  9  c    3
7  3  1  c    3
8  2  0  c    3


来源:https://stackoverflow.com/questions/50050617/assign-unique-numeric-group-ids-to-groups-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!