Interpreting PCA Results

我是研究僧i 提交于 2021-01-29 20:14:09


I am doing a principal component analysis on 5 variables within a dataframe to see which ones I can remove.

df <-data.frame(variableA, variableB, variableC, variableD, variableE)


gives the following results

                          PC1    PC2    PC3     PC4     PC5
Proportion of Variance 0.5127 0.2095 0.1716 0.06696 0.03925

My issue is that if I change the order of the variabes in the dataframe, I get the same results

df <-data.frame(variableC, variableF, variableA, variableE, variableB)


                          PC1    PC2    PC3     PC4     PC5
Proportion of Variance 0.5127 0.2095 0.1716 0.06696 0.03925

How do I know which of the 5 variables is related to PC1, which to PC2 etc?


Here is an approach to identify the components explaining up to 85% variance, using the spam data from the kernlab package.

# log transform independent variables, ensuring all values above 0
princomp <- prcomp(log10(spam[,-58]+1))
stats <- summary(princomp)
# extract variable importance and list items explaining up to 85% variance
importance <- stats$importance[3,]
importance[importance <= 0.85]

...and the output:

> importance[importance <= 0.85]
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9    PC10    PC11 
0.49761 0.58021 0.63101 0.67502 0.70835 0.73188 0.75100 0.76643 0.78044 0.79368 0.80648 
   PC12    PC13    PC14 
0.81886 0.83046 0.84129 

We can obtain the factor scores for the first 14 components as follows.

resultNames <- names(importance[importance <= 0.85])
# return factor scores 
x_result <-$x[,resultNames])

...and the output:

> head(x_result)
         PC1         PC2          PC3          PC4          PC5         PC6         PC7
1  0.7364988  0.19181730  0.041818854 -0.009236399  0.001232911  0.03723833 -0.01144332
2  1.3478167  0.22953561 -0.149444409  0.091569400 -0.148434128 -0.01923707 -0.07119210
3  2.0489632 -0.02668038  0.222492079 -0.107120738 -0.092968198 -0.06400683 -0.07078830
4  0.4912016  0.20921288 -0.002072148  0.015524007 -0.002347262 -0.14519336 -0.09238828
5  0.4911676  0.20916725 -0.002122664  0.015467369 -0.002373622 -0.14517812 -0.09243136
6 -0.2337956 -0.10508875  0.187831101 -0.335491660  0.099445713  0.09516875  0.11234080
          PC8          PC9        PC10        PC11        PC12         PC13        PC14
1 -0.08745771  0.079650230 -0.14450436  0.15945517 -0.06490913 -0.042909658  0.05739735
2  0.00233124 -0.091471125 -0.10304536  0.06973190  0.09373344  0.003069536  0.02892939
3 -0.10888375  0.227437609 -0.07419313  0.08217271 -0.12488575  0.150950134  0.05180459
4 -0.15862241  0.003044418  0.01609690  0.01720151  0.02313224  0.142176889 -0.04013102
5 -0.15848785  0.002944493  0.01606874  0.01725410  0.02304496  0.142527110 -0.04007788
6 -0.13790588  0.197294502  0.07851300 -0.08131269 -0.02091459  0.246810914 -0.01869192

