scikit-learn how to know documents in the cluster?

前端未结

关注

 2  1313

没有蜡笔的小新 2020-12-28 10:58

I am new to both python and scikit-learn so please bear with me.

I took this source code for k means clustering algorithm from k means clustering.

I then modif

2条回答

长发绾君心 (楼主)

2020-12-28 11:37

dataset.filenames is the key :)

This is how i did it.

load_files declaration is :

def load_files(container_path, description=None, categories=None,
           load_content=True, shuffle=True, charset=None,
           charse_error='strict', random_state=0)

so do

dataset_files = load_files("path_to_directory_containing_category_folders");

then when i got the result :

i put them in the clusters which is a dictionary

clusters = defaultdict(list)

k = 0;
for i in km.labels_ :
  clusters[i].append(dataset_files.filenames[k])  
  k += 1

and then i print it :)

for clust in clusters :
  print "\n************************\n"
  for filename in clusters[clust] :
    print filename

0 讨论(0)

查看其它2个回答