Using Principal Components Analysis (PCA) on binary data

懵懂的女人 提交于 2019-12-06 17:05:47

问题


I am using PCA on binary attributes to reduce the dimensions (attributes) of my problem. The initial dimensions were 592 and after PCA the dimensions are 497. I used PCA before, on numeric attributes in an other problem and it managed to reduce the dimensions in a greater extent (the half of the initial dimensions). I believe that binary attributes decrease the power of PCA, but i do not know why. Could you please explain me why PCA does not work as good as in numeric data.

Thank you.


回答1:


The principal components of 0/1 data can fall off slowly or rapidly, and the PCs of continuous data too — it depends on the data. Can you describe your data ?

The following picture is intended to compare the PCs of continuous image data vs. the PCs of the same data quantized to 0/1: in this case, inconclusive.

Look at PCA as a way of getting an approximation to a big matrix,
first with one term: approximate A ~ c U VT, c [Ui Vj].
Consider this a bit, with A say 10k x 500: U 10k long, V 500 long. The top row is c U1 V, the second row is c U2 V ... all the rows are proportional to V. Similarly the leftmost column is c U V1 ... all the columns are proportional to U.
But if all rows are similar (proportional to each other), they can't get near an A matix with rows or columns 0100010101 ...
With more terms, A ~ c1 U1 V1T + c2 U2 V2T + ..., we can get nearer to A: the smaller the higher ci, the faster.. (Of course, all 500 terms recreate A exactly, to within roundoff error.)

The top row is "lena", a well-known 512 x 512 matrix, with 1-term and 10-term SVD approximations. The bottom row is lena discretized to 0/1, again with 1 term and 10 terms. I thought that the 0/1 lena would be much worse -- comments, anyone ?

(U VT is also written U ⊗ V, called a "dyad" or "outer product".)

(The wikipedia articles Singular value decomposition and Low-rank approximation are a bit math-heavy. An AMS column by David Austin, We Recommend a Singular Value Decomposition gives some intuition on SVD / PCA -- highly recommended.)



来源:https://stackoverflow.com/questions/13505296/using-principal-components-analysis-pca-on-binary-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!