Python scikit learn pca.explained_variance_ratio_ cutoff

后端未结

关注

 3  1159

When choosing the number of principal components (k), we choose k to be the smallest value so that for example, 99% of variance, is retained.

However, in the Pytho

相关标签:

3条回答

失恋的感觉

2020-12-23 21:36

This worked for me with even less typing in the PCA section. The rest is added for convenience. Only 'data' needs to be defined in an earlier stage.

import sklearn as sl
from sklearn.preprocessing import StandardScaler as ss
from sklearn.decomposition import PCA 

st = ss().fit_transform(data)
pca = PCA(0.80)
pc = pca.fit_transform(st) # << to retain the components in an object
pc

#pca.explained_variance_ratio_
print ( "Components = ", pca.n_components_ , ";\nTotal explained variance = ",
      round(pca.explained_variance_ratio_.sum(),5)  )

0 讨论(0)

失恋的感觉

2020-12-23 21:45
Yes, you are nearly right. The pca.explained_variance_ratio_ parameter returns a vector of the variance explained by each dimension. Thus pca.explained_variance_ratio_[i] gives the variance explained solely by the i+1st dimension.

You probably want to do pca.explained_variance_ratio_.cumsum(). That will return a vector x such that x[i] returns the cumulative variance explained by the first i+1 dimensions.
```
import numpy as np
from sklearn.decomposition import PCA

np.random.seed(0)
my_matrix = np.random.randn(20, 5)

my_model = PCA(n_components=5)
my_model.fit_transform(my_matrix)

print my_model.explained_variance_
print my_model.explained_variance_ratio_
print my_model.explained_variance_ratio_.cumsum()
```
```
[ 1.50756565  1.29374452  0.97042041  0.61712667  0.31529082]
[ 0.32047581  0.27502207  0.20629036  0.13118776  0.067024  ]
[ 0.32047581  0.59549787  0.80178824  0.932976    1.        ]
```
So in my random toy data, if I picked k=4 I would retain 93.3% of the variance.
0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2020-12-23 21:47
Although this question is older than 2 years i want to provide an update on this. I wanted to do the same and it looks like sklearn now provides this feature out of the box.

As stated in the docs

if 0 < n_components < 1 and svd_solver == ‘full’, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components

So the code required is now
```
my_model = PCA(n_components=0.99, svd_solver='full')
my_model.fit_transform(my_matrix)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...