I can\'t figure out why numba is beating numpy here (over 3x). Did I make some fundamental error in how I am benchmarking here? Seems like the perfect situation for numpy,
Instead of cluttering the original question further, I'll add some more stuff here in response to Jeff, Jaime, Veedrac:
def proc_numpy2(x,y,z):
np.subtract( np.multiply(x,2), np.multiply(y,55),out=x)
np.add( x, np.multiply(y,2),out=y)
np.add(x,np.add(y,99),out=z)
np.multiply(z,np.subtract(z,.88),out=z)
return z
def proc_numpy3(x,y,z):
x *= 2
x -= y*55
y *= 2
y += x
z = x + y
z += 99
z *= (z-.88)
return z
My machine seems to be running a tad faster today than yesterday so here they are in comparison to proc_numpy (proc_numba is timing the same as before)
In [611]: %timeit proc_numpy(x,y,z)
10000 loops, best of 3: 103 µs per loop
In [612]: %timeit proc_numpy2(x,y,z)
10000 loops, best of 3: 92.5 µs per loop
In [613]: %timeit proc_numpy3(x,y,z)
10000 loops, best of 3: 85.1 µs per loop
Note that as I was writing proc_numpy2/3 that I started seeing some side effects so I made copies of x,y,z and passed the copies instead of re-using x,y,z. Also, the different functions sometimes had slight differences in precision, so some of the them didn't pass the equality tests but if you diff them, they are really close. I assume that is due to creating or (not creating) temp variables. E.g.:
In [458]: (res_numpy2 - res_numba)[:12]
Out[458]:
array([ -7.27595761e-12, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, -7.27595761e-12, 0.00000000e+00])
Also, it's pretty minor (about 10 µs) but using float literals (55. instead of 55) will also save a little time for numpy but doesn't help numba.