I am making some benchmarks with CUDA, C++, C#, Java, and using MATLAB for verification and matrix generation. When I perform matrix multiplication with MATLAB, 2048x
MATLAB uses a highly optimized implementation of LAPACK from Intel known as Intel Math Kernel Library (Intel MKL) - specifically the dgemm function. The speed This library takes advantage of processor features including SIMD instructions and multi-core processors. They don't document which specific algorithm they use. If you were to call Intel MKL from C++ you should see similar performance.
I am not sure what library MATLAB uses for GPU multiplication but probably something like nVidia CUBLAS.