PyTorch / Gensim - How to load pre-trained word embeddings

后端未结

关注

 6  770

甜味超标 2020-12-01 02:44

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer.

So my question is, how do I get the embedding weights loaded by gensim in

6条回答

萌比男神i (楼主)

2020-12-01 03:31

I had quite some problems in understanding the documentation myself and there aren't that many good examples around. Hopefully this example helps other people. It is a simple classifier, that takes the pretrained embeddings in the matrix_embeddings. By setting requires_grad to false we make sure that we are not changing them.

class InferClassifier(nn.Module):
  def __init__(self, input_dim, n_classes, matrix_embeddings):
    """initializes a 2 layer MLP for classification.
    There are no non-linearities in the original code, Katia instructed us 
    to use tanh instead"""

    super(InferClassifier, self).__init__()

    #dimensionalities
    self.input_dim = input_dim
    self.n_classes = n_classes
    self.hidden_dim = 512

    #embedding
    self.embeddings = nn.Embedding.from_pretrained(matrix_embeddings)
    self.embeddings.requires_grad = False

    #creates a MLP
    self.classifier = nn.Sequential(
            nn.Linear(self.input_dim, self.hidden_dim),
            nn.Tanh(), #not present in the original code.
            nn.Linear(self.hidden_dim, self.n_classes))

  def forward(self, sentence):
    """forward pass of the classifier
    I am not sure it is necessary to make this explicit."""

    #get the embeddings for the inputs
    u = self.embeddings(sentence)

    #forward to the classifier
    return self.classifier(x)

sentence is a vector with the indexes of matrix_embeddings instead of words.

0 讨论(0)

查看其它6个回答