Sklearn's PCA gives 'wrong' output for last row

问题

I am trying to run data through sklearn's PCA (n_components=2) and find that the y-value of the last row is different to the other values of the same input values. Notably, the input data only consist of two distinct entries and when changing the number of occurrences for an entry the error disappears.

Please find the code below to replicate the error.

import pandas as pd
from sklearn.decomposition import PCA

lst1 = [[-0.485886999,0,-0.485886999,-0.485886999,-0.485886999,0,-0.485886999,-0.485886999,-0.485886999,-0.485886999,-0.485886999,0.485886999,-0.485886999,-0.485886999,-0.485886999,-0.485886999]]*7798
lst2 = [[2.0580917,0,2.0580917,2.0580917,2.0580917,0,2.0580917,2.0580917,2.0580917,2.0580917,2.0580917,-2.0580917,2.0580917,2.0580917,2.0580917,2.0580917]]*1841

df_lst1 = pd.DataFrame(lst1)
df_lst2 = pd.DataFrame(lst2)
test = pd.concat([df_lst2, df_lst1], axis=0).reset_index(drop=True)

pca = PCA(n_components=2)
pca.fit(test)
result = pd.DataFrame(pca.transform(test), index=test.index)
print(result)

Input of the last three rows (the three rows are identical):

            0   1         2         3         4   5         6     ...           9         10        11        12        13        14        15
9636 -0.485887   0 -0.485887 -0.485887 -0.485887   0 -0.485887    ...    -0.485887 -0.485887  0.485887 -0.485887 -0.485887 -0.485887 -0.485887
9637 -0.485887   0 -0.485887 -0.485887 -0.485887   0 -0.485887    ...    -0.485887 -0.485887  0.485887 -0.485887 -0.485887 -0.485887 -0.485887
9638 -0.485887   0 -0.485887 -0.485887 -0.485887   0 -0.485887    ...    -0.485887 -0.485887  0.485887 -0.485887 -0.485887 -0.485887 -0.485887

Output of the last three rows:

             0             1
9636 -1.818023  1.679370e-17
9637 -1.818023  1.679370e-17
9638 -1.818023  0.000000e+00

来源：https://stackoverflow.com/questions/52778384/sklearns-pca-gives-wrong-output-for-last-row

标签

python

scikit-learn

pca