I\'m testing svd in Matlab R2014a and it seems that there is no CPU vs GPU speedup. I\'m using a GTX 460 car
As VAndrei has already stated, the SVD is an algorithm which is difficult to parallelize.
Your main problem is the size of your matrix. The performance of the SVD drops rapidly with a growing matrix size. So your main goal should be to reduce the size of the matrix. This can be accomplished using Gaussian normal equations (which is basically a reduction of an overdetermined linear system in the least-squares sense).
This can be done by simply multiplying the transpose onto the matrix:
MhReduced = Mh' * Mh;
This reduces your matrix to the size of cols*cols (if cols is the number of columns of Mh). Then you just call [U,S,V] = svd(MhReduced);
Note: Using this method may yield singular vectors with opposite sign (just important if you're comparing these methods).
If your matix is well-conditioned this should work without problems. However, in case of an ill-conditioned matrix, this method may fail to produce a usable result, whereas applying SVD directly could still yield a usable result due to SVD's robustness.
This should increase your performance immensly, at least with matrices big enough. Another advantage is that you can use much larger matrices. You'll probably won't have to use the GPU at all (since either matrices are so big that copying to GPU costs too much or after reduction the matrix is so small that the speedup of the GPU won't be big enough).
Also note that a large chunk of performance is lost, if you use return values. If you're only interested in the performance of the SVD caluclation, don't take any return values. If you are only interested in the "solution vector", just get V (and access the last column): [~,~, V] = svd(Mh);.
I've looked at your sample code, but I'm not sure what it is, you are calculating. Also I realized that it's rather hard to understand what I did with A'*A, so I will explain in detail.
Given a linear system with A*x=b, A denoting the coefficient matrix
with m rows and n cols, x the solution vector and b the constant vector (both with m rows), a solution can be calculated as follows:
m=n): x = A^-1 * b, if A is not square (m!=n, m > n):
A * x = b
A'* A * x = A' * b
x = (A' * A)^-1 * A'*b
A" = (A'*A)^-1 * A' is typically called pseudo-inverse. However this calculation does influence the condition number of the matrix negatively. A solution to this problem is using a singular value decomposition (SVD).
If USV = svd(A) denotes the results of the SVD, the pseudo-inverse is given by VS"U', with S" is formed by taking the inverse of the non-zero elements of S.
So A" = VS"U'.
x = A"*b
However since a SVD is rather costly, especially with large matrices. If matrix A is well-conditioned and very precicse results are not necessarily required (we're talking 1e-13 or 1e-14), the much faster approach by calculating the peseudo-inverse via (A'*A)^-1 * A can be used.
If your case actually is A*x=0, just use a SVD and read the last column vector from V, it is the solution.
If you use the SVD not to solve a linear system but for the results of U and S (as your example suggests), I'm not sure what I've posted will help you.
Sources: 1, 2, 3
Here is some sample code for you to test. Test it with large matrices, you will see that using (A'*A)^-1 * A' is much faster than the alternatives.
clear all
nbRows = 30000;
nbCols = 100;
% Matrix A
A = rand(nbRows,nbCols);
% Vector b
b = rand(nbRows,1);
% A*x=b
% Solve for x, using SVD
% [U,S,V]=svd(A,0);
% x= V*((U'*b)./diag(S))
tic
[U1,S1,V1]=svd(A,0);
x1= V1*((U1'*b)./diag(S1));
toc
tic
[U1,S1,V1]=svd(A,0);
x2 = V1*inv(S1)*U1'*b;
toc
% Solve for x, using manual pseudo-inverse
% A*x=b
% A'*A*x = A'*b
% x = (A'*A)^-1 * A'*b
tic
x3 = inv(A'*A) * A'*b;
toc
% Solve for x, let Matlab decide how (most likely SVD)
tic
x4 = A\b;
toc