definition
factorize: Map each unique object into a unique integer. Typically, the range of integers mapped to is from zero to the n - 1 where n is
Approach #1
Convert each tuple to a row of a 2D array, view each of those rows as one scalar using the views concept of NumPy ndarray and finally use np.unique(... return_inverse=True) to factorize -
np.unique(get_row_view(np.array(tups)), return_inverse=1)[1]
get_row_view is taken from here.
Sample run -
In [23]: tups
Out[23]: [(1, 2), ('a', 'b'), (3, 4), ('c', 5), (6, 'd'), ('a', 'b'), (3, 4)]
In [24]: np.unique(get_row_view(np.array(tups)), return_inverse=1)[1]
Out[24]: array([0, 3, 1, 4, 2, 3, 1])
Approach #2
def argsort_unique(idx):
# Original idea : https://stackoverflow.com/a/41242285/3293881
n = idx.size
sidx = np.empty(n,dtype=int)
sidx[idx] = np.arange(n)
return sidx
def unique_return_inverse_tuples(tups):
a = np.array(tups)
sidx = np.lexsort(a.T)
b = a[sidx]
mask0 = ~((b[1:,0] == b[:-1,0]) & (b[1:,1] == b[:-1,1]))
ids = np.concatenate(([0], mask0 ))
np.cumsum(ids, out=ids)
return ids[argsort_unique(sidx)]
Sample run -
In [69]: tups
Out[69]: [(1, 2), ('a', 'b'), (3, 4), ('c', 5), (6, 'd'), ('a', 'b'), (3, 4)]
In [70]: unique_return_inverse_tuples(tups)
Out[70]: array([0, 3, 1, 2, 4, 3, 1])