scikit-learn how to know documents in the cluster?

前端 未结 2 1313
没有蜡笔的小新
没有蜡笔的小新 2020-12-28 10:58

I am new to both python and scikit-learn so please bear with me.

I took this source code for k means clustering algorithm from k means clustering.

I then modif

2条回答
  •  长发绾君心
    2020-12-28 11:37

    dataset.filenames is the key :)

    This is how i did it.

    load_files declaration is :

    def load_files(container_path, description=None, categories=None,
               load_content=True, shuffle=True, charset=None,
               charse_error='strict', random_state=0)
    

    so do

    dataset_files = load_files("path_to_directory_containing_category_folders");
    

    then when i got the result :

    i put them in the clusters which is a dictionary

    clusters = defaultdict(list)
    
    k = 0;
    for i in km.labels_ :
      clusters[i].append(dataset_files.filenames[k])  
      k += 1
    

    and then i print it :)

    for clust in clusters :
      print "\n************************\n"
      for filename in clusters[clust] :
        print filename
    

提交回复
热议问题