Perform Chi-2 feature selection on TF and TF*IDF vectors
I'm experimenting with Chi-2 feature selection for some text classification tasks. I understand that Chi-2 test checks the dependencies B/T two categorical variables, so if we perform Chi-2 feature selection for a binary text classification problem with binary BOW vector representation, each Chi-2 test on each (feature,class) pair would be a very straightforward Chi-2 test with 1 degree of freedom. Quoting from the documentation: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html#sklearn.feature_selection.chi2 , This score can be used to select the n_features