PyTorch / Gensim - How to load pre-trained word embeddings

后端未结

关注

 6  777

甜味超标 2020-12-01 02:44

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer.

So my question is, how do I get the embedding weights loaded by gensim in

6条回答

自闭症患者 (楼主)

2020-12-01 03:19

I had the same question except that I use torchtext library with pytorch as it helps with padding, batching, and other things. This is what I've done to load pre-trained embeddings with torchtext 0.3.0 and to pass them to pytorch 0.4.1 (the pytorch part uses the method mentioned by blue-phoenox):

import torch
import torch.nn as nn
import torchtext.data as data
import torchtext.vocab as vocab

# use torchtext to define the dataset field containing text
text_field = data.Field(sequential=True)

# load your dataset using torchtext, e.g.
dataset = data.Dataset(examples=..., fields=[('text', text_field), ...])

# build vocabulary
text_field.build_vocab(dataset)

# I use embeddings created with
# model = gensim.models.Word2Vec(...)
# model.wv.save_word2vec_format(path_to_embeddings_file)

# load embeddings using torchtext
vectors = vocab.Vectors(path_to_embeddings_file) # file created by gensim
text_field.vocab.set_vectors(vectors.stoi, vectors.vectors, vectors.dim)

# when defining your network you can then use the method mentioned by blue-phoenox
embedding = nn.Embedding.from_pretrained(torch.FloatTensor(text_field.vocab.vectors))

# pass data to the layer
dataset_iter = data.Iterator(dataset, ...)
for batch in dataset_iter:
    ...
    embedding(batch.text)

0 讨论(0)

查看其它6个回答