How to get vector for a sentence from the word2vec of tokens in sentence

后端 未结 9 1868
鱼传尺愫
鱼传尺愫 2020-12-02 04:18

I have generated the vectors for a list of tokens from a large document using word2vec. Given a sentence, is it possible to get the vector of the sentence from the vector of

9条回答
  •  渐次进展
    2020-12-02 04:36

    You can get vector representations of sentences during training phase (join the test and train sentences in a single file and run word2vec code obtained from following link).

    Code for sentence2vec has been shared by Tomas Mikolov here. It assumes first word of a line to be sentence-id. Compile the code using

    gcc word2vec.c -o word2vec -lm -pthread -O3 -march=native -funroll-loops
    

    and run it using

    ./word2vec -train alldata-id.txt -output vectors.txt -cbow 0 -size 100 -window 10 -negative 5 -hs 0 -sample 1e-4 -threads 40 -binary 0 -iter 20 -min-count 1 -sentence-vectors 1
    

    EDIT

    Gensim (development version) seems to have a method to infer vectors of new sentences. Check out model.infer_vector(NewDocument) method in https://github.com/gojomo/gensim/blob/develop/gensim/models/doc2vec.py

提交回复
热议问题