multi-column factorize in pandas

后端 未结 4 733
长发绾君心
长发绾君心 2020-12-28 09:35

The pandas factorize function assigns each unique value in a series to a sequential, 0-based index, and calculates which index each series entry belongs to.

4条回答
  •  醉话见心
    2020-12-28 10:08

    You can use drop_duplicates to drop those duplicated rows

    In [23]: df.drop_duplicates()
    Out[23]: 
          x  y
       0  1  1
       1  1  2
       2  2  2
    

    EDIT

    To achieve your goal, you can join your original df to the drop_duplicated one:

    In [46]: df.join(df.drop_duplicates().reset_index().set_index(['x', 'y']), on=['x', 'y'])
    Out[46]: 
       x  y  index
    0  1  1      0
    1  1  2      1
    2  2  2      2
    3  2  2      2
    4  1  2      1
    5  1  1      0
    

提交回复
热议问题