Inefficiency of topic modelling for text clustering
问题 I tried doing text clustering using LDA, but it isn't giving me distinct clusters. Below is my code #Import libraries from gensim import corpora, models import pandas as pd from gensim.parsing.preprocessing import STOPWORDS from itertools import chain #stop words stoplist = list(STOPWORDS) new = ['education','certification','certificate','certified'] stoplist.extend(new) stoplist.sort() #read data dat = pd.read_csv('D:\data_800k.csv',encoding='latin').Certi.tolist() #remove stop words texts =