One Hot Encoding giving same number for different words in keras

大憨熊 提交于 2021-01-28 18:15:31

问题


Why I am getting same results for different words?

import keras
keras.__version__
'1.0.0'
import theano 
theano.__version__
'0.8.1'

from keras.preprocessing.text import one_hot
one_hot('START', 43)
[26]
one_hot('children', 43)
[26]

回答1:


unicity non-guaranteed in one hot encoding

see one hot keras documentation




回答2:


From the Keras source code, you can see that the words are hashed modulo the output dimension (43, in your case):

def one_hot(text, n,
        filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
        lower=True,
        split=' '):
    seq = text_to_word_sequence(text,
                            filters=filters,
                            lower=lower,
                            split=split)
    return [(abs(hash(w)) % (n - 1) + 1) for w in seq]

So it is very likely that there will be a collision.



来源:https://stackoverflow.com/questions/36591078/one-hot-encoding-giving-same-number-for-different-words-in-keras

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!