PyTorch / Gensim - How to load pre-trained word embeddings

后端 未结 6 770
甜味超标
甜味超标 2020-12-01 02:44

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer.

So my question is, how do I get the embedding weights loaded by gensim in

6条回答
  •  萌比男神i
    2020-12-01 03:31

    I had quite some problems in understanding the documentation myself and there aren't that many good examples around. Hopefully this example helps other people. It is a simple classifier, that takes the pretrained embeddings in the matrix_embeddings. By setting requires_grad to false we make sure that we are not changing them.

    class InferClassifier(nn.Module):
      def __init__(self, input_dim, n_classes, matrix_embeddings):
        """initializes a 2 layer MLP for classification.
        There are no non-linearities in the original code, Katia instructed us 
        to use tanh instead"""
    
        super(InferClassifier, self).__init__()
    
        #dimensionalities
        self.input_dim = input_dim
        self.n_classes = n_classes
        self.hidden_dim = 512
    
        #embedding
        self.embeddings = nn.Embedding.from_pretrained(matrix_embeddings)
        self.embeddings.requires_grad = False
    
        #creates a MLP
        self.classifier = nn.Sequential(
                nn.Linear(self.input_dim, self.hidden_dim),
                nn.Tanh(), #not present in the original code.
                nn.Linear(self.hidden_dim, self.n_classes))
    
      def forward(self, sentence):
        """forward pass of the classifier
        I am not sure it is necessary to make this explicit."""
    
        #get the embeddings for the inputs
        u = self.embeddings(sentence)
    
        #forward to the classifier
        return self.classifier(x)
    

    sentence is a vector with the indexes of matrix_embeddings instead of words.

提交回复
热议问题