Using gensim
I was able to extract topics from a set of documents in LSA but how do I access the topics generated from the LDA models?
When printing the
Are you using any logging? print_topics
prints to the logfile as stated in the docs.
As @mac389 says, lda.show_topics()
is the way to go to print to screen.
Using Gensim for cleaning it's own topic format.
from gensim.parsing.preprocessing import preprocess_string, strip_punctuation,
strip_numeric
lda_topics = lda.show_topics(num_words=5)
topics = []
filters = [lambda x: x.lower(), strip_punctuation, strip_numeric]
for topic in lda_topics:
print(topic)
topics.append(preprocess_string(topic[1], filters))
print(topics)
Output :
(0, '0.020*"business" + 0.018*"data" + 0.012*"experience" + 0.010*"learning" + 0.008*"analytics"')
(1, '0.027*"data" + 0.020*"experience" + 0.013*"business" + 0.010*"role" + 0.009*"science"')
(2, '0.026*"data" + 0.016*"experience" + 0.012*"learning" + 0.011*"machine" + 0.009*"business"')
(3, '0.028*"data" + 0.015*"analytics" + 0.015*"experience" + 0.008*"business" + 0.008*"skills"')
(4, '0.014*"data" + 0.009*"learning" + 0.009*"machine" + 0.009*"business" + 0.008*"experience"')
[
['business', 'data', 'experience', 'learning', 'analytics'],
['data', 'experience', 'business', 'role', 'science'],
['data', 'experience', 'learning', 'machine', 'business'],
['data', 'analytics', 'experience', 'business', 'skills'],
['data', 'learning', 'machine', 'business', 'experience']
]
Here is sample code to print topics:
def ExtractTopics(filename, numTopics=5):
# filename is a pickle file where I have lists of lists containing bag of words
texts = pickle.load(open(filename, "rb"))
# generate dictionary
dict = corpora.Dictionary(texts)
# remove words with low freq. 3 is an arbitrary number I have picked here
low_occerance_ids = [tokenid for tokenid, docfreq in dict.dfs.iteritems() if docfreq == 3]
dict.filter_tokens(low_occerance_ids)
dict.compactify()
corpus = [dict.doc2bow(t) for t in texts]
# Generate LDA Model
lda = models.ldamodel.LdaModel(corpus, num_topics=numTopics)
i = 0
# We print the topics
for topic in lda.show_topics(num_topics=numTopics, formatted=False, topn=20):
i = i + 1
print "Topic #" + str(i) + ":",
for p, id in topic:
print dict[int(id)],
print ""
Recently, came across a similar issue while working with Python 3 and Gensim 2.3.0. print_topics()
and show_topics()
weren't giving any error but also not printing anything. Turns out that show_topics()
returns a list. So one can simply do:
topic_list = show_topics()
print(topic_list)