importance of PCA or SVD in machine learning

前端 未结 3 1944
离开以前
离开以前 2020-12-22 22:45

All this time (specially in Netflix contest), I always come across this blog (or leaderboard forum) where they mention how by applying a simple SVD step on data helped them

3条回答
  •  滥情空心
    2020-12-22 23:36

    The Singular Value Decomposition is often used to approximate a matrix X by a low rank matrix X_lr:

    1. Compute the SVD X = U D V^T.
    2. Form the matrix D' by keeping the k largest singular values and setting the others to zero.
    3. Form the matrix X_lr by X_lr = U D' V^T.

    The matrix X_lr is then the best approximation of rank k of the matrix X, for the Frobenius norm (the equivalent of the l2-norm for matrices). It is computationally efficient to use this representation, because if your matrix X is n by n and k << n, you can store its low rank approximation with only (2n + 1)k coefficients (by storing U, D' and V).

    This was often used in matrix completion problems (such as collaborative filtering) because the true matrix of user ratings is assumed to be low rank (or well approximated by a low rank matrix). So, you wish to recover the true matrix by computing the best low rank approximation of your data matrix. However, there are now better ways to recover low rank matrices from noisy and missing observations, namely nuclear norm minimization. See for example the paper The power of convex relaxation: Near-optimal matrix completion by E. Candes and T. Tao.

    (Note: the algorithms derived from this technique also store the SVD of the estimated matrix, but it is computed differently).

提交回复
热议问题