multi-column factorize in pandas

后端 未结 4 729
长发绾君心
长发绾君心 2020-12-28 09:35

The pandas factorize function assigns each unique value in a series to a sequential, 0-based index, and calculates which index each series entry belongs to.

4条回答
  •  不思量自难忘°
    2020-12-28 09:54

    You need to create a ndarray of tuple first, pandas.lib.fast_zip can do this very fast in cython loop.

    import pandas as pd
    df = pd.DataFrame({'x': [1, 1, 2, 2, 1, 1], 'y':[1, 2, 2, 2, 2, 1]})
    print pd.factorize(pd.lib.fast_zip([df.x, df.y]))[0]
    

    the output is:

    [0 1 2 2 1 0]
    

提交回复
热议问题