Problem:
Given an array of string data
dataSet = np.array([\'kevin\', \'greg\', \'george\', \'kevin\'], dtype=\'U21\'),
You can use np.unique with the return_inverse
argument:
>>> lookupTable, indexed_dataSet = np.unique(dataSet, return_inverse=True)
>>> lookupTable
array(['george', 'greg', 'kevin'],
dtype='>> indexed_dataSet
array([2, 1, 0, 2])
If you like, you can reconstruct your original array from these two arrays:
>>> lookupTable[indexed_dataSet]
array(['kevin', 'greg', 'george', 'kevin'],
dtype='
If you use pandas, lookupTable, indexed_dataSet = pd.factorize(dataSet)
will achieve the same thing (and potentially be more efficient for large arrays).