Using pretrained gensim Word2vec embedding in keras

前端 未结 3 708
盖世英雄少女心
盖世英雄少女心 2021-01-05 00:41

I have trained word2vec in gensim. In Keras, I want to use it to make matrix of sentence using that word embedding. As storing the matrix of all the sentences is very space

3条回答
  •  长发绾君心
    2021-01-05 01:27

    My code for gensim-trained w2v model. Assume all words trained in the w2v model is now a list variable called all_words.

    from keras.preprocessing.text import Tokenizer
    import gensim
    import pandas as pd
    import numpy as np
    from itertools import chain
    
    w2v = gensim.models.Word2Vec.load("models/w2v.model")
    vocab = w2v.wv.vocab    
    t = Tokenizer()
    
    vocab_size = len(all_words) + 1
    t.fit_on_texts(all_words)
    
    def get_weight_matrix():
        # define weight matrix dimensions with all 0
        weight_matrix = np.zeros((vocab_size, w2v.vector_size))
        # step vocab, store vectors using the Tokenizer's integer mapping
        for i in range(len(all_words)):
            weight_matrix[i + 1] = w2v[all_words[i]]
        return weight_matrix
    
    embedding_vectors = get_weight_matrix()
    emb_layer = Embedding(vocab_size, output_dim=w2v.vector_size, weights=[embedding_vectors], input_length=FIXED_LENGTH, trainable=False)
    

提交回复
热议问题