I\'m currently using the Keras Tokenizer to create a word index and then matching that word index to the the imported GloVe dictionary to create an embedding matrix. Howeve
In Keras Tokenizer you have the oov_token parameter. Just select your token and unknown words will have that one.
tokenizer_a = Tokenizer(oov_token=1)
tokenizer_b = Tokenizer()
tokenizer_a.fit_on_texts(["Hello world"])
tokenizer_b.fit_on_texts(["Hello world"])
Outputs
In [26]: tokenizer_a.texts_to_sequences(["Hello cruel world"])
Out[26]: [[2, 1, 3]]
In [27]: tokenizer_b.texts_to_sequences(["Hello cruel world"])
Out[27]: [[1, 2]]