I\'ve derived a LDA topic model using a toy corpus as follows:
documents = [\'Human machine interface for lab abc computer applications\',
\'A s
Read the source and it turns out that topics with probabilities smaller than a threshold are ignored. This threshold is with a default value of 0.01.
I realise this is an old question but in case someone stumbles upon it, here is a solution (the issue has actually been fixed in the current development branch with a minimum_probability
parameter to LdaModel
but maybe you're running an older version of gensim).
define a new function (this is just copied from the source)
def get_doc_topics(lda, bow):
gamma, _ = lda.inference([bow])
topic_dist = gamma[0] / sum(gamma[0]) # normalize distribution
return [(topicid, topicvalue) for topicid, topicvalue in enumerate(topic_dist)]
the above function does not filter the output topics based on the probability but will output all of them. If you don't need the (topic_id, value)
tuples but just values, just return the topic_dist
instead of the list comprehension (it'll be much faster as well).