The pandas factorize function assigns each unique value in a series to a sequential, 0-based index, and calculates which index each series entry belongs to.
factorize
You need to create a ndarray of tuple first, pandas.lib.fast_zip can do this very fast in cython loop.
pandas.lib.fast_zip
import pandas as pd df = pd.DataFrame({'x': [1, 1, 2, 2, 1, 1], 'y':[1, 2, 2, 2, 2, 1]}) print pd.factorize(pd.lib.fast_zip([df.x, df.y]))[0]
the output is:
[0 1 2 2 1 0]