definition
factorize: Map each unique object into a unique integer. Typically, the range of integers mapped to is from zero to the n - 1 where n is
I was going to give this answer
pd.factorize([str(x) for x in tups])
However, after running some test, it did not pan out to be the fastest of them all. Since I already did the work, I will show it here for comparison:
@AChampion
%timeit [d[tup] if tup in d else d.setdefault(tup, next(c)) for tup in tups]
1000000 loops, best of 3: 1.66 µs per loop
@Divakar
%timeit np.unique(get_row_view(np.array(tups)), return_inverse=1)[1]
# 10000 loops, best of 3: 58.1 µs per loop
@self
%timeit pd.factorize([str(x) for x in tups])
# 10000 loops, best of 3: 65.6 µs per loop
@root
%timeit pd.Series(tups).factorize()[0]
# 1000 loops, best of 3: 199 µs per loop
EDIT
For large data with 100K entries, we have:
tups = [(np.random.randint(0, 10), np.random.randint(0, 10)) for i in range(100000)]
@root
%timeit pd.Series(tups).factorize()[0]
100 loops, best of 3: 10.9 ms per loop
@AChampion
%timeit [d[tup] if tup in d else d.setdefault(tup, next(c)) for tup in tups]
# 10 loops, best of 3: 16.9 ms per loop
@Divakar
%timeit np.unique(get_row_view(np.array(tups)), return_inverse=1)[1]
# 10 loops, best of 3: 81 ms per loop
@self
%timeit pd.factorize([str(x) for x in tups])
10 loops, best of 3: 87.5 ms per loop