I need to build a classifier for text, and now I\'m using TfidfVectorizer and SelectKBest to selection the features, as following:
vectorizer = TfidfVectoriz
To expand on @ogrisel's answer, the returned list of features is in the same order when they've been vectorized. The code below will give you a list of top ranked features sorted according to their Chi-2 scores in descending order (along with the corresponding p-values):
top_ranked_features = sorted(enumerate(ch2.scores_),key=lambda x:x[1], reverse=True)[:1000]
top_ranked_features_indices = map(list,zip(*top_ranked_features))[0]
for feature_pvalue in zip(np.asarray(train_vectorizer.get_feature_names())[top_ranked_features_indices],ch2.pvalues_[top_ranked_features_indices]):
print feature_pvalue
The following should work:
np.asarray(vectorizer.get_feature_names())[ch2.get_support()]