PCA projection and reconstruction in scikit-learn

情到浓时终转凉″ 提交于 2019-12-02 15:59:08
eickenberg

You can do

proj = pca.inverse_transform(X_train_pca)

That way you do not have to worry about how to do the multiplications.

What you obtain after pca.fit_transform or pca.transform are what is usually called the "loadings" for each sample, meaning how much of each component you need to describe it best using a linear combination of the components_ (the principal axes in feature space).

The projection you are aiming at is back in the original signal space. This means that you need to go back into signal space using the components and the loadings.

So there are three steps to disambiguate here. Here you have, step by step, what you can do using the PCA object and how it is actually calculated:

  1. pca.fit estimates the components (using an SVD on the centered Xtrain):

    from sklearn.decomposition import PCA
    import numpy as np
    from numpy.testing import assert_array_almost_equal
    
    #Should this variable be X_train instead of Xtrain?
    X_train = np.random.randn(100, 50)
    
    pca = PCA(n_components=30)
    pca.fit(X_train)
    
    U, S, VT = np.linalg.svd(X_train - X_train.mean(0))
    
    assert_array_almost_equal(VT[:30], pca.components_)
    
  2. pca.transform calculates the loadings as you describe

    X_train_pca = pca.transform(X_train)
    
    X_train_pca2 = (X_train - pca.mean_).dot(pca.components_.T)
    
    assert_array_almost_equal(X_train_pca, X_train_pca2)
    
  3. pca.inverse_transform obtains the projection onto components in signal space you are interested in

    X_projected = pca.inverse_transform(X_train_pca)
    X_projected2 = X_train_pca.dot(pca.components_) + pca.mean_
    
    assert_array_almost_equal(X_projected, X_projected2)
    

You can now evaluate the projection loss

loss = ((X_train - X_projected) ** 2).mean()
Sandipan Dey

Adding on @eickenberg's post, here is how to do the pca reconstruction of digits' images:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn import decomposition

n_components = 10
image_shape = (8, 8)

digits = load_digits()
digits = digits.data

n_samples, n_features = digits.shape
estimator = decomposition.PCA(n_components=n_components, svd_solver='randomized', whiten=True)
digits_recons = estimator.inverse_transform(estimator.fit_transform(digits))

# show 5 randomly chosen digits and their PCA reconstructions with 10 dominant eigenvectors
indices = np.random.choice(n_samples, 5, replace=False)
plt.figure(figsize=(5,2))
for i in range(len(indices)):
    plt.subplot(1,5,i+1), plt.imshow(np.reshape(digits[indices[i],:], image_shape)), plt.axis('off')
plt.suptitle('Original', size=25)
plt.show()
plt.figure(figsize=(5,2))
for i in range(len(indices)):
    plt.subplot(1,5,i+1), plt.imshow(np.reshape(digits_recons[indices[i],:], image_shape)), plt.axis('off')
plt.suptitle('PCA reconstructed'.format(n_components), size=25)
plt.show()

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!