Get bigrams and trigrams in word2vec Gensim

后端 未结 3 665
余生分开走
余生分开走 2021-02-01 08:18

I am currently using uni-grams in my word2vec model as follows.

def review_to_sentences( review, tokenizer, remove_stopwords=False ):
    #Returns a list of sent         


        
3条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-01 08:45

    Phrases and Phraser are those you should looking for

    bigram = gensim.models.Phrases(data_words, min_count=1, threshold=10) # higher threshold fewer phrases.
    trigram = gensim.models.Phrases(bigram[data_words], threshold=100) 
    

    Once you are enough done with adding vocabs then use Phraser for faster access and efficient memory usage. Not mandatory but useful.

    bigram_mod = gensim.models.phrases.Phraser(bigram)
    trigram_mod = gensim.models.phrases.Phraser(trigram)
    

    Thanks,

提交回复
热议问题