Vectorized code in Matlab runs much faster than a for loop (see Parallel computing in Octave on a single machine -- package and example for concrete results in Octave)
In Matlab, the only way to get built-in vectorized functions to multithread is to wait for MathWorks to implement them as such.
Alternatively, you can write the vectorized computation as a loop, and run them in parallel using parfor.
Finally, a number of functions are GPU-enabled, so with access to the parallel processing toolbox you can parallelize these operations, including the subtraction and the element-wise power.