问题
I am trying to run data through sklearn's PCA (n_components=2)
and find that the y-value of the last row is different to the other values of the same input values. Notably, the input data only consist of two distinct entries and when changing the number of occurrences for an entry the error disappears.
Please find the code below to replicate the error.
import pandas as pd
from sklearn.decomposition import PCA
lst1 = [[-0.485886999,0,-0.485886999,-0.485886999,-0.485886999,0,-0.485886999,-0.485886999,-0.485886999,-0.485886999,-0.485886999,0.485886999,-0.485886999,-0.485886999,-0.485886999,-0.485886999]]*7798
lst2 = [[2.0580917,0,2.0580917,2.0580917,2.0580917,0,2.0580917,2.0580917,2.0580917,2.0580917,2.0580917,-2.0580917,2.0580917,2.0580917,2.0580917,2.0580917]]*1841
df_lst1 = pd.DataFrame(lst1)
df_lst2 = pd.DataFrame(lst2)
test = pd.concat([df_lst2, df_lst1], axis=0).reset_index(drop=True)
pca = PCA(n_components=2)
pca.fit(test)
result = pd.DataFrame(pca.transform(test), index=test.index)
print(result)
Input of the last three rows (the three rows are identical):
0 1 2 3 4 5 6 ... 9 10 11 12 13 14 15
9636 -0.485887 0 -0.485887 -0.485887 -0.485887 0 -0.485887 ... -0.485887 -0.485887 0.485887 -0.485887 -0.485887 -0.485887 -0.485887
9637 -0.485887 0 -0.485887 -0.485887 -0.485887 0 -0.485887 ... -0.485887 -0.485887 0.485887 -0.485887 -0.485887 -0.485887 -0.485887
9638 -0.485887 0 -0.485887 -0.485887 -0.485887 0 -0.485887 ... -0.485887 -0.485887 0.485887 -0.485887 -0.485887 -0.485887 -0.485887
Output of the last three rows:
0 1
9636 -1.818023 1.679370e-17
9637 -1.818023 1.679370e-17
9638 -1.818023 0.000000e+00
来源:https://stackoverflow.com/questions/52778384/sklearns-pca-gives-wrong-output-for-last-row