Using keras tokenizer for new words not in training set

后端 未结 3 1424
没有蜡笔的小新
没有蜡笔的小新 2020-12-28 08:45

I\'m currently using the Keras Tokenizer to create a word index and then matching that word index to the the imported GloVe dictionary to create an embedding matrix. Howeve

3条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-28 09:30

    In Keras Tokenizer you have the oov_token parameter. Just select your token and unknown words will have that one.

    tokenizer_a = Tokenizer(oov_token=1)
    tokenizer_b = Tokenizer()
    tokenizer_a.fit_on_texts(["Hello world"])
    tokenizer_b.fit_on_texts(["Hello world"])
    

    Outputs

    In [26]: tokenizer_a.texts_to_sequences(["Hello cruel world"])
    Out[26]: [[2, 1, 3]]
    
    In [27]: tokenizer_b.texts_to_sequences(["Hello cruel world"])
    Out[27]: [[1, 2]]
    

提交回复
热议问题