Understanding the output of Doc2Vec from Gensim package

后端 未结 2 427
暖寄归人
暖寄归人 2021-01-05 09:43

I have some sample sentences that I want to run through a Doc2Vec model. My end goal is a matrix of size (num_sentences, num_features).

I\'m using the Gensim packag

2条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-05 10:05

    TaggedDocument expects tags to be a list of tags related to document.

    In your case,

    sentence = TaggedDocument(words=['a', 'b'], tags='400')
    

    gets interpreted as sentence having 3 tags ['4','0','0'], and hence model.docvecs returns vectors corresponding to 10 tags - ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

    Try changing this to

    sentence = TaggedDocument(words=['a', 'b'], tags=['400'])
    

提交回复
热议问题