Principal component analysis in Python

前端 未结 11 574
情深已故
情深已故 2020-11-30 16:49

I\'d like to use principal component analysis (PCA) for dimensionality reduction. Does numpy or scipy already have it, or do I have to roll my own using numpy.linalg.eigh?<

相关标签:
11条回答
  • 2020-11-30 17:22

    matplotlib.mlab has a PCA implementation.

    0 讨论(0)
  • 2020-11-30 17:22

    If you're working with 3D vectors, you can apply SVD concisely using the toolbelt vg. It's a light layer on top of numpy.

    import numpy as np
    import vg
    
    vg.principal_components(data)
    

    There's also a convenient alias if you only want the first principal component:

    vg.major_axis(data)
    

    I created the library at my last startup, where it was motivated by uses like this: simple ideas which are verbose or opaque in NumPy.

    0 讨论(0)
  • 2020-11-30 17:25

    You might have a look at MDP.

    I have not had the chance to test it myself, but I've bookmarked it exactly for the PCA functionality.

    0 讨论(0)
  • 2020-11-30 17:26

    You do not need full Singular Value Decomposition (SVD) at it computes all eigenvalues and eigenvectors and can be prohibitive for large matrices. scipy and its sparse module provide generic linear algrebra functions working on both sparse and dense matrices, among which there is the eig* family of functions :

    http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html#matrix-factorizations

    Scikit-learn provides a Python PCA implementation which only support dense matrices for now.

    Timings :

    In [1]: A = np.random.randn(1000, 1000)
    
    In [2]: %timeit scipy.sparse.linalg.eigsh(A)
    1 loops, best of 3: 802 ms per loop
    
    In [3]: %timeit np.linalg.svd(A)
    1 loops, best of 3: 5.91 s per loop
    
    0 讨论(0)
  • 2020-11-30 17:28

    You can quite easily "roll" your own using scipy.linalg (assuming a pre-centered dataset data):

    covmat = data.dot(data.T)
    evs, evmat = scipy.linalg.eig(covmat)
    

    Then evs are your eigenvalues, and evmat is your projection matrix.

    If you want to keep d dimensions, use the first d eigenvalues and first d eigenvectors.

    Given that scipy.linalg has the decomposition and numpy the matrix multiplications, what else do you need?

    0 讨论(0)
提交回复
热议问题