find important features for classification

我的梦境 提交于 2019-12-20 09:23:34

问题


I'm trying to classify some EEG data using a logistic regression model (this seems to give the best classification of my data). The data I have is from a multichannel EEG setup so in essence I have a matrix of 63 x 116 x 50 (that is channels x time points x number of trials (there are two trial types of 50), I have reshaped this to a long vector, one for each trial.

What I would like to do is after the classification to see which features were the most useful in classifying the trials. How can I do that and is it possible to test the significance of these features? e.g. to say that the classification was drive mainly by N-features and these are feature x to z. So I could for instance say that channel 10 at time point 90-95 was significant or important for the classification.

So is this possible or am I asking the wrong question?

any comments or paper references are much appreciated.


回答1:


Scikit-learn includes quite a few methods for feature ranking, among them:

  • Univariate feature selection (http://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection.html)
  • Recursive feature elimination (http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_digits.html)
  • Randomized Logistic Regression/stability selection (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RandomizedLogisticRegression.html)

(see more at http://scikit-learn.org/stable/modules/feature_selection.html)

Among those, I definitely recommend giving Randomized Logistic Regression a shot. In my experience, it consistently outperforms other methods and is very stable. Paper on this: http://arxiv.org/pdf/0809.2932v2.pdf

Edit: I have written a series of blog posts on different feature selection methods and their pros and cons, which are probably useful for answering this question in more detail:

  • http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/
  • http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/
  • http://blog.datadive.net/selecting-good-features-part-iii-random-forests/
  • http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/


来源:https://stackoverflow.com/questions/15796247/find-important-features-for-classification

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!