Principal component analysis in Python

前端未结

关注

 11  574

I\'d like to use principal component analysis (PCA) for dimensionality reduction. Does numpy or scipy already have it, or do I have to roll my own using numpy.linalg.eigh?<

相关标签:

11条回答

广开言路

2020-11-30 17:22

matplotlib.mlab has a PCA implementation.

0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2020-11-30 17:22
If you're working with 3D vectors, you can apply SVD concisely using the toolbelt vg. It's a light layer on top of numpy.
```
import numpy as np
import vg

vg.principal_components(data)
```
There's also a convenient alias if you only want the first principal component:
```
vg.major_axis(data)
```
I created the library at my last startup, where it was motivated by uses like this: simple ideas which are verbose or opaque in NumPy.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-11-30 17:25

You might have a look at MDP.

I have not had the chance to test it myself, but I've bookmarked it exactly for the PCA functionality.

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-11-30 17:26
You do not need full Singular Value Decomposition (SVD) at it computes all eigenvalues and eigenvectors and can be prohibitive for large matrices. scipy and its sparse module provide generic linear algrebra functions working on both sparse and dense matrices, among which there is the eig* family of functions :

http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html#matrix-factorizations

Scikit-learn provides a Python PCA implementation which only support dense matrices for now.

Timings :
```
In [1]: A = np.random.randn(1000, 1000)

In [2]: %timeit scipy.sparse.linalg.eigsh(A)
1 loops, best of 3: 802 ms per loop

In [3]: %timeit np.linalg.svd(A)
1 loops, best of 3: 5.91 s per loop
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2020-11-30 17:28
You can quite easily "roll" your own using scipy.linalg (assuming a pre-centered dataset data):
```
covmat = data.dot(data.T)
evs, evmat = scipy.linalg.eig(covmat)
```
Then evs are your eigenvalues, and evmat is your projection matrix.

If you want to keep d dimensions, use the first d eigenvalues and first d eigenvectors.

Given that scipy.linalg has the decomposition and numpy the matrix multiplications, what else do you need?
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2