How to find most contributing features to PCA?

你离开我真会死。 提交于 2019-12-07 05:56:57

问题


I am running PCA on my data (~250 features) and see that all points are clustered in 3 blobs.

Is it possible to see which of the 250 features have been most contributing to the outcome? if so how?

(using the Scikit-learn implementation)


回答1:


Let's see what wikipedia says:

PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

To get how 'influent' are vectors from original space in the smaller one you have to project them as well. Which is done by:

res = pca.transform(np.eye(D))
  • np.eye(n) creates a n x n diagonal matrix (one on diagonal, 0 otherwise).
  • Thus, np.eye(D) is your features in original feature space
  • res is the projection of your features in lower space.

The interesting thing is that res is a D x d matrix where res[i][j] represent "how much feature i contribute to component j"

Then, you may just sum over columns to get a D x 1 matrix (call it contributiion where each contribution[i] is the total contribution of feature i.

Sort it and you find the most contributing feature :)

Not sure its clear, could add any kind of additional information.

Hope this helps, pltrdy



来源:https://stackoverflow.com/questions/40295888/how-to-find-most-contributing-features-to-pca

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!