How to get most informative features for scikit-learn classifier for different class?

后端 未结 3 1535
伪装坚强ぢ
伪装坚强ぢ 2020-12-05 12:50

NLTK package provides a method show_most_informative_features() to find the most important features for both class, with output like:

   contai         


        
3条回答
  •  死守一世寂寞
    2020-12-05 12:54

    You can get the same with two classes on the left and right side:

               precision    recall  f1-score   support
    
     Irrelevant       0.77      0.98      0.86       129
       Relevant       0.78      0.15      0.25        46
    
    avg / total       0.77      0.77      0.70       175
    
        -1.3914 davis                   1.4809  austin
        -1.1023 suicide                 1.0695  march
        -1.0609 arrested                1.0379  call
        -1.0145 miller                  1.0152  tsa
        -0.8902 packers                 0.9848  passengers
        -0.8370 train                   0.9547  pensacola
        -0.7557 trevor                  0.7432  bag
        -0.7457 near                    0.7056  conditt
        -0.7359 military                0.7002  midamerica
        -0.7302 berlin                  0.6987  mark
        -0.6880 april                   0.6799  grenade
        -0.6581 plane                   0.6357  suspicious
        -0.6351 disposal                0.6348  death
        -0.5804 wwii                    0.6053  flight
        -0.5723 terminal                0.5745  marabi
    
    
    def Show_most_informative_features(vectorizer, clf, n=20):
        feature_names = vectorizer.get_feature_names()
        coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))
        top = zip(coefs_with_fns[:n], coefs_with_fns[:-(n + 1):-1])
        for (coef_1, fn_1), (coef_2, fn_2) in top:
          print ("\t%.4f\t%-15s\t\t%.4f\t%-15s" % (coef_1, fn_1, coef_2, fn_2))
    

提交回复
热议问题