Map a NumPy array of strings to integers

前端 未结 2 1168
耶瑟儿~
耶瑟儿~ 2020-12-07 01:19

Problem:

Given an array of string data

dataSet = np.array([\'kevin\', \'greg\', \'george\', \'kevin\'], dtype=\'U21\'), 


        
相关标签:
2条回答
  • 2020-12-07 01:42

    np.searchsorted does the trick:

    dataSet = np.array(['kevin', 'greg', 'george', 'kevin'], dtype='U21'), 
    lut = np.sort(np.unique(dataSet))  # [u'george', u'greg', u'kevin']
    ind = np.searchsorted(lut,dataSet) # array([[2, 1, 0, 2]])
    
    0 讨论(0)
  • 2020-12-07 01:46

    You can use np.unique with the return_inverse argument:

    >>> lookupTable, indexed_dataSet = np.unique(dataSet, return_inverse=True)
    >>> lookupTable
    array(['george', 'greg', 'kevin'], 
          dtype='<U21')
    >>> indexed_dataSet
    array([2, 1, 0, 2])
    

    If you like, you can reconstruct your original array from these two arrays:

    >>> lookupTable[indexed_dataSet]
    array(['kevin', 'greg', 'george', 'kevin'], 
          dtype='<U21')
    

    If you use pandas, lookupTable, indexed_dataSet = pd.factorize(dataSet) will achieve the same thing (and potentially be more efficient for large arrays).

    0 讨论(0)
提交回复
热议问题