Document topical distribution in Gensim LDA

后端 未结 2 1329
情深已故
情深已故 2020-12-15 20:40

I\'ve derived a LDA topic model using a toy corpus as follows:

documents = [\'Human machine interface for lab abc computer applications\',
             \'A s         


        
相关标签:
2条回答
  • 2020-12-15 20:59

    Read the source and it turns out that topics with probabilities smaller than a threshold are ignored. This threshold is with a default value of 0.01.

    0 讨论(0)
  • 2020-12-15 21:18

    I realise this is an old question but in case someone stumbles upon it, here is a solution (the issue has actually been fixed in the current development branch with a minimum_probability parameter to LdaModel but maybe you're running an older version of gensim).

    define a new function (this is just copied from the source)

    def get_doc_topics(lda, bow):
        gamma, _ = lda.inference([bow])
        topic_dist = gamma[0] / sum(gamma[0])  # normalize distribution
        return [(topicid, topicvalue) for topicid, topicvalue in enumerate(topic_dist)]
    

    the above function does not filter the output topics based on the probability but will output all of them. If you don't need the (topic_id, value) tuples but just values, just return the topic_dist instead of the list comprehension (it'll be much faster as well).

    0 讨论(0)
提交回复
热议问题