Sklearn's PCA gives 'wrong' output for last row

送分小仙女□ 提交于 2021-01-28 11:05:57

问题


I am trying to run data through sklearn's PCA (n_components=2) and find that the y-value of the last row is different to the other values of the same input values. Notably, the input data only consist of two distinct entries and when changing the number of occurrences for an entry the error disappears.

Please find the code below to replicate the error.

import pandas as pd
from sklearn.decomposition import PCA

lst1 = [[-0.485886999,0,-0.485886999,-0.485886999,-0.485886999,0,-0.485886999,-0.485886999,-0.485886999,-0.485886999,-0.485886999,0.485886999,-0.485886999,-0.485886999,-0.485886999,-0.485886999]]*7798
lst2 = [[2.0580917,0,2.0580917,2.0580917,2.0580917,0,2.0580917,2.0580917,2.0580917,2.0580917,2.0580917,-2.0580917,2.0580917,2.0580917,2.0580917,2.0580917]]*1841

df_lst1 = pd.DataFrame(lst1)
df_lst2 = pd.DataFrame(lst2)
test = pd.concat([df_lst2, df_lst1], axis=0).reset_index(drop=True)

pca = PCA(n_components=2)
pca.fit(test)
result = pd.DataFrame(pca.transform(test), index=test.index)
print(result)

Input of the last three rows (the three rows are identical):

            0   1         2         3         4   5         6     ...           9         10        11        12        13        14        15
9636 -0.485887   0 -0.485887 -0.485887 -0.485887   0 -0.485887    ...    -0.485887 -0.485887  0.485887 -0.485887 -0.485887 -0.485887 -0.485887
9637 -0.485887   0 -0.485887 -0.485887 -0.485887   0 -0.485887    ...    -0.485887 -0.485887  0.485887 -0.485887 -0.485887 -0.485887 -0.485887
9638 -0.485887   0 -0.485887 -0.485887 -0.485887   0 -0.485887    ...    -0.485887 -0.485887  0.485887 -0.485887 -0.485887 -0.485887 -0.485887

Output of the last three rows:

             0             1
9636 -1.818023  1.679370e-17
9637 -1.818023  1.679370e-17
9638 -1.818023  0.000000e+00

来源:https://stackoverflow.com/questions/52778384/sklearns-pca-gives-wrong-output-for-last-row

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!