I have generated the vectors for a list of tokens from a large document using word2vec. Given a sentence, is it possible to get the vector of the sentence from the vector of
Deep averaging network (DAN) can provide sentence embeddings in which word bi-grams are averaged and passed through feedforward deep neural network(DNN).
It is found that transfer learning using sentence embeddings tends to outperform word level transfer as it preserves the semantic relationship.
You don't need to start the training from scratch, the pretrained DAN models are available for perusal ( Check Universal Sentence Encoder module in google hub).
let suppose this is current sentence
import gensim
from gensim.models import Word2Vec
from gensim import models
model = gensim.models.KeyedVectors.load_word2vec_format('path of your trainig
dataset', binary=True)
strr = 'i am'
strr2 = strr.split()
print(strr2)
model[strr2] //this the the sentance embeddings.
There are differet methods to get the sentence vectors :
I've had good results from:
You can get vector representations of sentences during training phase (join the test and train sentences in a single file and run word2vec code obtained from following link).
Code for sentence2vec has been shared by Tomas Mikolov here. It assumes first word of a line to be sentence-id. Compile the code using
gcc word2vec.c -o word2vec -lm -pthread -O3 -march=native -funroll-loops
and run it using
./word2vec -train alldata-id.txt -output vectors.txt -cbow 0 -size 100 -window 10 -negative 5 -hs 0 -sample 1e-4 -threads 40 -binary 0 -iter 20 -min-count 1 -sentence-vectors 1
EDIT
Gensim (development version) seems to have a method to infer vectors of new sentences. Check out model.infer_vector(NewDocument)
method in https://github.com/gojomo/gensim/blob/develop/gensim/models/doc2vec.py
Google's Universal Sentence Encoder embeddings are an updated solution to this problem. It doesn't use Word2vec but results in a competing solution.
Here is a walk-through with TFHub and Keras.