How can I use PCA/SVD in Python for feature selection AND identification?

浪尽此生 提交于 2019-12-05 06:59:48

问题


I'm following Principal component analysis in Python to use PCA under Python, but am struggling with determining which features to choose (i.e. which of my columns/features have the best variance).

When I use scipy.linalg.svd, it automatically sorts my Singular Values, so I can't tell which column they belong to.

Example code:

import numpy as np
from scipy.linalg import svd
M = [
     [1, 1, 1, 1, 1, 1],
     [3, 3, 3, 3, 3, 3],
     [2, 2, 2, 2, 2, 2],
     [9, 9, 9, 9, 9, 9]
]
M = np.transpose(np.array(M))
U,s,Vt = svd(M, full_matrices=False)
print s

Is there a different way to go about this without the Singular Values being sorted?

Update: It looks like this might not be possible, at least according to this post on the Matlab forums: http://www.mathworks.com/matlabcentral/newsreader/view_thread/241607. If anyone knows otherwise, let me know :)


回答1:


I was under the wrong impression that PCA did feature selection, whereas instead it does feature extraction.

Instead, PCA creates a new series of features, each of which is a combination of the input features.

From PCA, if you really wanted to do feature selection, you could look at the weightings of the input features on the PCA created features. For instance, the matplotlib.mlab.PCA library provides the weights in a property (more on library):

from matplotlib.mlab import PCA
res = PCA(data)
print "weights of input vectors: %s" % res.Wt

Sounds like the feature extraction route is the way to use PCA though.



来源:https://stackoverflow.com/questions/14205941/how-can-i-use-pca-svd-in-python-for-feature-selection-and-identification

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!