Using gensim
I was able to extract topics from a set of documents in LSA but how do I access the topics generated from the LDA models?
When printing the
Using Gensim for cleaning it's own topic format.
from gensim.parsing.preprocessing import preprocess_string, strip_punctuation,
strip_numeric
lda_topics = lda.show_topics(num_words=5)
topics = []
filters = [lambda x: x.lower(), strip_punctuation, strip_numeric]
for topic in lda_topics:
print(topic)
topics.append(preprocess_string(topic[1], filters))
print(topics)
Output :
(0, '0.020*"business" + 0.018*"data" + 0.012*"experience" + 0.010*"learning" + 0.008*"analytics"')
(1, '0.027*"data" + 0.020*"experience" + 0.013*"business" + 0.010*"role" + 0.009*"science"')
(2, '0.026*"data" + 0.016*"experience" + 0.012*"learning" + 0.011*"machine" + 0.009*"business"')
(3, '0.028*"data" + 0.015*"analytics" + 0.015*"experience" + 0.008*"business" + 0.008*"skills"')
(4, '0.014*"data" + 0.009*"learning" + 0.009*"machine" + 0.009*"business" + 0.008*"experience"')
[
['business', 'data', 'experience', 'learning', 'analytics'],
['data', 'experience', 'business', 'role', 'science'],
['data', 'experience', 'learning', 'machine', 'business'],
['data', 'analytics', 'experience', 'business', 'skills'],
['data', 'learning', 'machine', 'business', 'experience']
]