gensim | 易学教程

How to speed up Gensim Word2vec model load time?

阅读更多关于 How to speed up Gensim Word2vec model load time?

问题 I'm building a chatbot so I need to vectorize the user's input using Word2Vec. I'm using a pre-trained model with 3 million words by Google (GoogleNews-vectors-negative300). So I load the model using Gensim: import gensim model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) The problem is that it takes about 2 minutes to load the model. I can't let the user wait that long. So what can I do to speed up the load time? I thought about putting

Topic distribution: How do we see which document belong to which topic after doing LDA in python

阅读更多关于 Topic distribution: How do we see which document belong to which topic after doing LDA in python

问题 I am able to run the LDA code from gensim and got the top 10 topics with their respective keywords. Now I would like to go a step further to see how accurate the LDA algo is by seeing which document they cluster into each topic. Is this possible in gensim LDA? Basically i would like to do something like this, but in python and using gensim. LDA with topicmodels, how can I see which topics different documents belong to? 回答1: Using the probabilities of the topics, you can try to set some

Update gensim word2vec model

阅读更多关于 Update gensim word2vec model

问题 I have a word2vec model in gensim trained over 98892 documents. For any given sentence that is not present in the sentences array (i.e. the set over which I trained the model), I need to update the model with that sentence so that querying it next time gives out some results. I am doing it like this: new_sentence = ['moscow', 'weather', 'cold'] model.train(new_sentence) and its printing this as logs: 2014-03-01 16:46:58,061 : INFO : training model with 1 workers on 98892 vocabulary and 100

Convert word2vec bin file to text

阅读更多关于 Convert word2vec bin file to text

问题 From the word2vec site I can download GoogleNews-vectors-negative300.bin.gz. The .bin file (about 3.4GB) is a binary format not useful to me. Tomas Mikolov assures us that "It should be fairly straightforward to convert the binary format to text format (though that will take more disk space). Check the code in the distance tool, it's rather trivial to read the binary file." Unfortunately, I don't know enough C to understand http://word2vec.googlecode.com/svn/trunk/distance.c. Supposedly

How to calculate the sentence similarity using word2vec model of gensim with python

阅读更多关于 How to calculate the sentence similarity using word2vec model of gensim with python

问题 According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words. e.g. trained_model.similarity(\'woman\', \'man\') 0.73723527 However, the word2vec model fails to predict the sentence similarity. I find out the LSI model with sentence similarity in gensim, but, which doesn\'t seem that can be combined with word2vec model. The length of corpus of each sentence I have is not very long (shorter than 10 words). So, are there any simple