How to print the LDA topics models from gensim? Python

后端 未结 10 1634
北荒
北荒 2020-12-04 13:42

Using gensim I was able to extract topics from a set of documents in LSA but how do I access the topics generated from the LDA models?

When printing the

相关标签:
10条回答
  • 2020-12-04 14:05

    After some messing around, it seems like print_topics(numoftopics) for the ldamodel has some bug. So my workaround is to use print_topic(topicid):

    >>> print lda.print_topics()
    None
    >>> for i in range(0, lda.num_topics-1):
    >>>  print lda.print_topic(i)
    0.083*response + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system
    ...
    
    0 讨论(0)
  • 2020-12-04 14:07

    you can use:

    for i in  lda_model.show_topics():
        print i[0], i[1]
    
    0 讨论(0)
  • 2020-12-04 14:07

    You can also export the top words from each topic to a csv file. topn controls how many words under each topic to export.

    import pandas as pd
    
    top_words_per_topic = []
    for t in range(lda_model.num_topics):
        top_words_per_topic.extend([(t, ) + x for x in lda_model.show_topic(t, topn = 5)])
    
    pd.DataFrame(top_words_per_topic, columns=['Topic', 'Word', 'P']).to_csv("top_words.csv")
    

    The CSV file has the following format

    Topic Word  P  
    0     w1    0.004437  
    0     w2    0.003553  
    0     w3    0.002953  
    0     w4    0.002866  
    0     w5    0.008813  
    1     w6    0.003393  
    1     w7    0.003289  
    1     w8    0.003197 
    ... 
    
    0 讨论(0)
  • 2020-12-04 14:07
    ****This code works fine but I want to know the topic name instead of Topic: 0 and Topic:1, How do i know which topic this word comes in**?** 
    
    
    
    for index, topic in lda_model.show_topics(formatted=False, num_words= 30):
            print('Topic: {} \nWords: {}'.format(idx, [w[0] for w in topic]))
    
    Topic: 0 
    Words: ['associate', 'incident', 'time', 'task', 'pain', 'amcare', 'work', 'ppe', 'train', 'proper', 'report', 'standard', 'pmv', 'level', 'perform', 'wear', 'date', 'factor', 'overtime', 'location', 'area', 'yes', 'new', 'treatment', 'start', 'stretch', 'assign', 'condition', 'participate', 'environmental']
    Topic: 1 
    Words: ['work', 'associate', 'cage', 'aid', 'shift', 'leave', 'area', 'eye', 'incident', 'aider', 'hit', 'pit', 'manager', 'return', 'start', 'continue', 'pick', 'call', 'come', 'right', 'take', 'report', 'lead', 'break', 'paramedic', 'receive', 'get', 'inform', 'room', 'head']
    
    0 讨论(0)
  • 2020-12-04 14:11

    I think it is alway more helpful to see the topics as a list of words. The following code snippet helps acchieve that goal. I assume you already have an lda model called lda_model.

    for index, topic in lda_model.show_topics(formatted=False, num_words= 30):
        print('Topic: {} \nWords: {}'.format(idx, [w[0] for w in topic]))
    

    In the above code, I have decided to show the first 30 words belonging to each topic. For simplicity, I have shown the first topic I get.

    Topic: 0 
    Words: ['associate', 'incident', 'time', 'task', 'pain', 'amcare', 'work', 'ppe', 'train', 'proper', 'report', 'standard', 'pmv', 'level', 'perform', 'wear', 'date', 'factor', 'overtime', 'location', 'area', 'yes', 'new', 'treatment', 'start', 'stretch', 'assign', 'condition', 'participate', 'environmental']
    Topic: 1 
    Words: ['work', 'associate', 'cage', 'aid', 'shift', 'leave', 'area', 'eye', 'incident', 'aider', 'hit', 'pit', 'manager', 'return', 'start', 'continue', 'pick', 'call', 'come', 'right', 'take', 'report', 'lead', 'break', 'paramedic', 'receive', 'get', 'inform', 'room', 'head']
    

    I don't really like the way the above topics look so I usually modify my code to as shown:

    for idx, topic in lda_model.show_topics(formatted=False, num_words= 30):
        print('Topic: {} \nWords: {}'.format(idx, '|'.join([w[0] for w in topic])))
    

    ... and the output (first 2 topics shown) will look like.

    Topic: 0 
    Words: associate|incident|time|task|pain|amcare|work|ppe|train|proper|report|standard|pmv|level|perform|wear|date|factor|overtime|location|area|yes|new|treatment|start|stretch|assign|condition|participate|environmental
    Topic: 1 
    Words: work|associate|cage|aid|shift|leave|area|eye|incident|aider|hit|pit|manager|return|start|continue|pick|call|come|right|take|report|lead|break|paramedic|receive|get|inform|room|head
    
    0 讨论(0)
  • 2020-12-04 14:21

    I think syntax of show_topics has changed over time:

    show_topics(num_topics=10, num_words=10, log=False, formatted=True)
    

    For num_topics number of topics, return num_words most significant words (10 words per topic, by default).

    The topics are returned as a list – a list of strings if formatted is True, or a list of (probability, word) 2-tuples if False.

    If log is True, also output this result to log.

    Unlike LSA, there is no natural ordering between the topics in LDA. The returned num_topics <= self.num_topics subset of all topics is therefore arbitrary and may change between two LDA training runs.

    0 讨论(0)
提交回复
热议问题