multi-column factorize in pandas

后端未结

关注

 4  733

长发绾君心 2020-12-28 09:35

The pandas factorize function assigns each unique value in a series to a sequential, 0-based index, and calculates which index each series entry belongs to.

4条回答

醉话见心 (楼主)

2020-12-28 10:08

You can use drop_duplicates to drop those duplicated rows

In [23]: df.drop_duplicates()
Out[23]: 
      x  y
   0  1  1
   1  1  2
   2  2  2

EDIT

To achieve your goal, you can join your original df to the drop_duplicated one:

In [46]: df.join(df.drop_duplicates().reset_index().set_index(['x', 'y']), on=['x', 'y'])
Out[46]: 
   x  y  index
0  1  1      0
1  1  2      1
2  2  2      2
3  2  2      2
4  1  2      1
5  1  1      0

0 讨论(0)

查看其它4个回答