I\'d like to use principal component analysis (PCA) for dimensionality reduction. Does numpy or scipy already have it, or do I have to roll my own using numpy.linalg.eigh?<
You can quite easily "roll" your own using scipy.linalg
(assuming a pre-centered dataset data
):
covmat = data.dot(data.T)
evs, evmat = scipy.linalg.eig(covmat)
Then evs
are your eigenvalues, and evmat
is your projection matrix.
If you want to keep d
dimensions, use the first d
eigenvalues and first d
eigenvectors.
Given that scipy.linalg
has the decomposition and numpy the matrix multiplications, what else do you need?