Using keras tokenizer for new words not in training set

后端未结

关注

 3  1424

没有蜡笔的小新 2020-12-28 08:45

I\'m currently using the Keras Tokenizer to create a word index and then matching that word index to the the imported GloVe dictionary to create an embedding matrix. Howeve

3条回答

夕颜 (楼主)

2020-12-28 09:30

In Keras Tokenizer you have the oov_token parameter. Just select your token and unknown words will have that one.

tokenizer_a = Tokenizer(oov_token=1)
tokenizer_b = Tokenizer()
tokenizer_a.fit_on_texts(["Hello world"])
tokenizer_b.fit_on_texts(["Hello world"])

Outputs

In [26]: tokenizer_a.texts_to_sequences(["Hello cruel world"])
Out[26]: [[2, 1, 3]]

In [27]: tokenizer_b.texts_to_sequences(["Hello cruel world"])
Out[27]: [[1, 2]]

0 讨论(0)

查看其它3个回答