How to one-hot-encode sentences at the character level?

前端 未结 5 881
-上瘾入骨i
-上瘾入骨i 2021-01-15 15:16

I would like to convert a sentence to an array of one-hot vector. These vector would be the one-hot representation of the alphabet. It would look like the following:

5条回答
  •  春和景丽
    2021-01-15 15:44

    With pandas, you can use pd.get_dummies by passing a categorical Series:

    import pandas as pd
    import string
    low = string.ascii_lowercase
    
    pd.get_dummies(pd.Series(list(s)).astype('category', categories=list(low)))
    Out: 
       a  b  c  d  e  f  g  h  i  j ...  q  r  s  t  u  v  w  x  y  z
    0  0  0  0  0  0  0  0  1  0  0 ...  0  0  0  0  0  0  0  0  0  0
    1  0  0  0  0  1  0  0  0  0  0 ...  0  0  0  0  0  0  0  0  0  0
    2  0  0  0  0  0  0  0  0  0  0 ...  0  0  0  0  0  0  0  0  0  0
    3  0  0  0  0  0  0  0  0  0  0 ...  0  0  0  0  0  0  0  0  0  0
    4  0  0  0  0  0  0  0  0  0  0 ...  0  0  0  0  0  0  0  0  0  0
    
    [5 rows x 26 columns]
    

提交回复
热议问题