How to print the LDA topics models from gensim? Python

后端 未结 10 1635
北荒
北荒 2020-12-04 13:42

Using gensim I was able to extract topics from a set of documents in LSA but how do I access the topics generated from the LDA models?

When printing the

相关标签:
10条回答
  • 2020-12-04 14:21

    Are you using any logging? print_topics prints to the logfile as stated in the docs.

    As @mac389 says, lda.show_topics() is the way to go to print to screen.

    0 讨论(0)
  • 2020-12-04 14:21

    Using Gensim for cleaning it's own topic format.

    from gensim.parsing.preprocessing import preprocess_string, strip_punctuation,
    strip_numeric
    
    lda_topics = lda.show_topics(num_words=5)
    
    topics = []
    filters = [lambda x: x.lower(), strip_punctuation, strip_numeric]
    
    for topic in lda_topics:
        print(topic)
        topics.append(preprocess_string(topic[1], filters))
    
    print(topics)
    

    Output :

    (0, '0.020*"business" + 0.018*"data" + 0.012*"experience" + 0.010*"learning" + 0.008*"analytics"')
    (1, '0.027*"data" + 0.020*"experience" + 0.013*"business" + 0.010*"role" + 0.009*"science"')
    (2, '0.026*"data" + 0.016*"experience" + 0.012*"learning" + 0.011*"machine" + 0.009*"business"')
    (3, '0.028*"data" + 0.015*"analytics" + 0.015*"experience" + 0.008*"business" + 0.008*"skills"')
    (4, '0.014*"data" + 0.009*"learning" + 0.009*"machine" + 0.009*"business" + 0.008*"experience"')
    
    
    [
      ['business', 'data', 'experience', 'learning', 'analytics'], 
      ['data', 'experience', 'business', 'role', 'science'], 
      ['data', 'experience', 'learning', 'machine', 'business'], 
      ['data', 'analytics', 'experience', 'business', 'skills'], 
      ['data', 'learning', 'machine', 'business', 'experience']
    ]
    
    0 讨论(0)
  • 2020-12-04 14:21

    Here is sample code to print topics:

    def ExtractTopics(filename, numTopics=5):
        # filename is a pickle file where I have lists of lists containing bag of words
        texts = pickle.load(open(filename, "rb"))
    
        # generate dictionary
        dict = corpora.Dictionary(texts)
    
        # remove words with low freq.  3 is an arbitrary number I have picked here
        low_occerance_ids = [tokenid for tokenid, docfreq in dict.dfs.iteritems() if docfreq == 3]
        dict.filter_tokens(low_occerance_ids)
        dict.compactify()
        corpus = [dict.doc2bow(t) for t in texts]
        # Generate LDA Model
        lda = models.ldamodel.LdaModel(corpus, num_topics=numTopics)
        i = 0
        # We print the topics
        for topic in lda.show_topics(num_topics=numTopics, formatted=False, topn=20):
            i = i + 1
            print "Topic #" + str(i) + ":",
            for p, id in topic:
                print dict[int(id)],
    
            print ""
    
    0 讨论(0)
  • 2020-12-04 14:25

    Recently, came across a similar issue while working with Python 3 and Gensim 2.3.0. print_topics() and show_topics() weren't giving any error but also not printing anything. Turns out that show_topics() returns a list. So one can simply do:

    topic_list = show_topics()
    print(topic_list)
    
    0 讨论(0)
提交回复
热议问题