Hierarchical Dirichlet Process Gensim topic number independent of corpus size

前端 未结 7 1809
余生分开走
余生分开走 2021-02-04 07:20

I am using the Gensim HDP module on a set of documents.

>>> hdp = models.HdpModel(corpusB, id2word=dictionaryB)
>>> topics = hdp.print_topics(         


        
7条回答
  •  半阙折子戏
    2021-02-04 07:47

    There is apparently a bug in Gensim(version 3.8.3), in which giving -1 to show_topics doesn't return anything at all. So I have tweaked the answers by Roko Mijic and aaron.

    def topic_prob_extractor(gensim_hdp):
        shown_topics = gensim_hdp.show_topics(num_topics=gensim_hdp.m_T, formatted=False)
        topics_nos = [x[0] for x in shown_topics ]
        weights = [ sum([item[1] for item in shown_topics[topicN][1]]) for topicN in topics_nos ]
        return pd.DataFrame({'topic_id' : topics_nos, 'weight' : weights})
    
    

提交回复
热议问题