Hierarchical Dirichlet Process Gensim topic number independent of corpus size

前端 未结 7 1793
余生分开走
余生分开走 2021-02-04 07:20

I am using the Gensim HDP module on a set of documents.

>>> hdp = models.HdpModel(corpusB, id2word=dictionaryB)
>>> topics = hdp.print_topics(         


        
7条回答
  •  自闭症患者
    2021-02-04 08:08

    @Aaron's code above is broken due to gensim API changes. I rewrote and simplified it as follows. Works as of June 2017 with gensim v2.1.0

    import pandas as pd
    
    def topic_prob_extractor(gensim_hdp):
        shown_topics = gensim_hdp.show_topics(num_topics=-1, formatted=False)
        topics_nos = [x[0] for x in shown_topics ]
        weights = [ sum([item[1] for item in shown_topics[topicN][1]]) for topicN in topics_nos ]
    
        return pd.DataFrame({'topic_id' : topics_nos, 'weight' : weights})
    

提交回复
热议问题