Significance of 99% of variance covered by the first component in PCA

半腔热情 提交于 2019-12-31 03:11:21

问题


What does it mean/signify when the first component covers for more than 99% of the total variance in PCA analysis ? I have a feature vector of size 500X1000 on which I used Matlab's pca function which returns [coeff,score,latent,tsquared,explained]. The variable 'explained' returns the percentage of variance covered by each component.


回答1:


The explained tells you how accurately you could represent the data by just using that principal component. In your case it means that just using the main principal component, you can describe very accurately (to a 99%) the data.

Lets make a 2D example. Imagine you have data that is 100x2 and you do PCA.

the result could be something like this (taken from the internets)

This data will give you an explained value for the first principal component (PCA 1st dimension big green arrow in the figure) of around 90%.

What does it means?

It means that if you project all your data to that line, you will reconstruct the points with 90% of accuracy (of course, you will loose the information in the PCA 2nd dimension direction).

In your example, with 99% it visually means that almost all the points in blue are laying on the big green arrow, with very little variation in the small green arrow direction.

Of course it is way more difficult to visualize with 1000 dimensions instead of 2, but I hope you understand.



来源:https://stackoverflow.com/questions/30777569/significance-of-99-of-variance-covered-by-the-first-component-in-pca

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!