Diffrent PCA plots | 易学教程

问题

I was trying to to learn pca(using the iris dataset) with python and i got some results,so i wanted to test the results ir R to make sure it was good.When i checked the results,it gave me a mirror diagram that of python(in the y axis),and the negative numeric sign in some of the values(python: [140,1]=0.1826089,r[141,2]=-0.1826089[python counts form zero]).

The python code:

import numpy as np
import matplotlib.pyplot as plt
import sklearn.decomposition as p
data=np.loadtxt("sample_data/iris.txt",delimiter=';',usecols=(0,1,2,3))
pca=p.PCA().fit(data)
pcaData=pca.transform(data)
plt.scatter(pcaData[:,0],pcaData[:,1])
print(pcaData[140,1])

My python diagram

The R code:

data=read.csv("C:\\Users\\George\\Desktop\\iris.csv",sep=";",colClasses=c(NA, NA, NA,NA,"NULL"));data=data[-151,]
pca=prcomp(data)
plot(pca$x[,1],pca$x[,2])
print(pca$x[141,2])

My R diagram

In search i did on the internet,i found the same happens.

The R diagram on the internet-Source

The Python diagram on the internet-Source.

I was expecting to be the same. Is somthing that i do not understand well?

Thank you.

回答1:

ScikitLearn uses a pseudo-randomized method to determine an approximation of the singular value decomposition.

see https://scikit-learn.org/stable/modules/generated/sklearn.utils.extmath.randomized_svd.html

Therefore, unless you can guarantee that the methods are the same and use the same random seed, you will not obtain exactly the same values for the principal components.

来源：https://stackoverflow.com/questions/56253444/diffrent-pca-plots

标签

python

pca