I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer.
So my question is, how do I get the embedding weights loaded by gensim in
I had quite some problems in understanding the documentation myself and there aren't that many good examples around. Hopefully this example helps other people. It is a simple classifier, that takes the pretrained embeddings in the matrix_embeddings. By setting requires_grad to false we make sure that we are not changing them.
class InferClassifier(nn.Module):
def __init__(self, input_dim, n_classes, matrix_embeddings):
"""initializes a 2 layer MLP for classification.
There are no non-linearities in the original code, Katia instructed us
to use tanh instead"""
super(InferClassifier, self).__init__()
#dimensionalities
self.input_dim = input_dim
self.n_classes = n_classes
self.hidden_dim = 512
#embedding
self.embeddings = nn.Embedding.from_pretrained(matrix_embeddings)
self.embeddings.requires_grad = False
#creates a MLP
self.classifier = nn.Sequential(
nn.Linear(self.input_dim, self.hidden_dim),
nn.Tanh(), #not present in the original code.
nn.Linear(self.hidden_dim, self.n_classes))
def forward(self, sentence):
"""forward pass of the classifier
I am not sure it is necessary to make this explicit."""
#get the embeddings for the inputs
u = self.embeddings(sentence)
#forward to the classifier
return self.classifier(x)
sentence is a vector with the indexes of matrix_embeddings instead of words.