Bug in Scikit-Learn PCA or in Numpy Eigen Decomposition?

浪尽此生 提交于 2019-12-06 08:06:11

You need to read the doc more carefully. numpy's doc is great and very thorough, very often you'll find the solution to your problem only by reading it.

Here is a modified version of your code (import on top of snippet, use of .T instead of .transpose(), pep8.)

import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import PCA

from numpy import linalg as LA

d_train = np.random.randn(100, 10)
d_cov = np.cov(d_train.transpose())
eigens, mypca = LA.eig(d_cov)

pca = PCA(n_components=10)
d_fit = pca.fit_transform(d_train)
pc = pca.components_
explained = pca.explained_variance_

my_explained = np.sort([x.T.dot(d_cov).dot(x) for x in mypca.T])[::-1]

plt.close('all')
plt.figure()
plt.plot(my_explained, label='me')
plt.plot(explained, label='sklearn')
plt.legend()
plt.show(block=False)

The two curves are exactly the same. The important thing is that I iterate over my_pca.T, not my_pca.

Signature: np.linalg.eig(a)
Docstring:
Compute the eigenvalues and right eigenvectors of a square array.

Parameters
----------
a : (..., M, M) array
    Matrices for which the eigenvalues and right eigenvectors will
    be computed

Returns
-------
w : (..., M) array
    # not important for you

v : (..., M, M) array
    The normalized (unit "length") eigenvectors, such that the
    column ``v[:,i]`` is the eigenvector corresponding to the
    eigenvalue ``w[i]``.

The eigenvectors are returned as columns of my_pca, not rows. for x in my_pca was iterating over rows.

I'm not an expert on PCA but I seem to get similar values if I transpose one of the matrices.

>>> import numpy as np
>>> LA = np.linalg
>>> d_train = np.random.randn(100, 10)
>>> d_cov = np.cov(d_train.transpose())
>>> eigens, mypca = LA.eig(d_cov)
>>> from sklearn.decomposition import PCA
>>> pca =  PCA(n_components=10)
>>> d_fit = pca.fit_transform(d_train)
>>> pc = pca.components_
>>> mypca[0,:]
array([-0.44255435, -0.77430549, -0.14479638, -0.06459874,  0.24772212,
        0.20780185,  0.22388151, -0.05069543, -0.14515676, -0.03385801])
>>> pc[0,:]
array([-0.44255435, -0.24050535, -0.17313927,  0.07182494,  0.09748632,
        0.17910516,  0.26125107,  0.71309764,  0.17276004,  0.25095447])
>>> pc.transpose()[0,:]
array([-0.44255435,  0.77430549,  0.14479638, -0.06459874,  0.24772212,
       -0.20780185,  0.22388151, -0.03385801,  0.14515676,  0.05069543])
>>> list(zip(pc.transpose()[:,0], mypca[:,0]))
[(-0.44255435328718207, -0.44255435328718096),
 (-0.24050535133912765, -0.2405053513391287),
 (-0.17313926714559819, -0.17313926714559785),
 (0.07182494253930383, 0.0718249425393035),
 (0.09748631534772645, 0.09748631534772684),
 (0.17910516453826955, 0.17910516453826758),
 (0.2612510722861703, 0.2612510722861689),
 (0.7130976419217306, 0.7130976419217326),
 (0.17276004381786172, 0.17276004381786136),
 (0.25095447415020183, 0.2509544741502009)]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!