How to get vector for a sentence from the word2vec of tokens in sentence

后端未结

关注

 9  1865

I have generated the vectors for a list of tokens from a large document using word2vec. Given a sentence, is it possible to get the vector of the sentence from the vector of

相关标签:

9条回答

深忆病人

2020-12-02 04:30

Deep averaging network (DAN) can provide sentence embeddings in which word bi-grams are averaged and passed through feedforward deep neural network(DNN).

It is found that transfer learning using sentence embeddings tends to outperform word level transfer as it preserves the semantic relationship.

You don't need to start the training from scratch, the pretrained DAN models are available for perusal ( Check Universal Sentence Encoder module in google hub).

0 讨论(0)
发布评论:

提交评论
- 加载中...

南方客

2020-12-02 04:32

let suppose this is current sentence

import gensim 
from gensim.models import Word2Vec
from gensim import models
model = gensim.models.KeyedVectors.load_word2vec_format('path of your trainig 
dataset', binary=True)

strr = 'i am'
strr2 = strr.split()
print(strr2)
model[strr2] //this the the sentance embeddings.

0 讨论(0)

我在风中等你

2020-12-02 04:33
There are differet methods to get the sentence vectors :
1. Doc2Vec : you can train your dataset using Doc2Vec and then use the sentence vectors.
2. Average of Word2Vec vectors : You can just take the average of all the word vectors in a sentence. This average vector will represent your sentence vector.
3. Average of Word2Vec vectors with TF-IDF : this is one of the best approach which I will recommend. Just take the word vectors and multiply it with their TF-IDF scores. Just take the average and it will represent your sentence vector.
0 讨论(0)
发布评论:

提交评论
- 加载中...
耶瑟儿～

2020-12-02 04:35
I've had good results from:
1. Summing the word vectors (with tf-idf weighting). This ignores word order, but for many applications is sufficient (especially for short documents)
2. Fastsent
0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-12-02 04:36
You can get vector representations of sentences during training phase (join the test and train sentences in a single file and run word2vec code obtained from following link).

Code for sentence2vec has been shared by Tomas Mikolov here. It assumes first word of a line to be sentence-id. Compile the code using
```
gcc word2vec.c -o word2vec -lm -pthread -O3 -march=native -funroll-loops
```
and run it using
```
./word2vec -train alldata-id.txt -output vectors.txt -cbow 0 -size 100 -window 10 -negative 5 -hs 0 -sample 1e-4 -threads 40 -binary 0 -iter 20 -min-count 1 -sentence-vectors 1
```
EDIT

Gensim (development version) seems to have a method to infer vectors of new sentences. Check out model.infer_vector(NewDocument) method in https://github.com/gojomo/gensim/blob/develop/gensim/models/doc2vec.py
0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2020-12-02 04:37

Google's Universal Sentence Encoder embeddings are an updated solution to this problem. It doesn't use Word2vec but results in a competing solution.

Here is a walk-through with TFHub and Keras.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页