numpy float: 10x slower than builtin in arithmetic operations?

前端 未结 8 1056
别跟我提以往
别跟我提以往 2020-12-01 05:00

I am getting really weird timings for the following code:

import numpy as np
s = 0
for i in range(10000000):
    s += np.float64(1) # replace with np.float32         


        
8条回答
  •  甜味超标
    2020-12-01 05:50

    Summary

    If an arithmetic expression contains both numpy and built-in numbers, Python arithmetics works slower. Avoiding this conversion removes almost all of the performance degradation I reported.

    Details

    Note that in my original code:

    s = np.float64(1)
    for i in range(10000000):
      s = (s + 8) * s % 2399232
    

    the types float and numpy.float64 are mixed up in one expression. Perhaps Python had to convert them all to one type?

    s = np.float64(1)
    for i in range(10000000):
      s = (s + np.float64(8)) * s % np.float64(2399232)
    

    If the runtime is unchanged (rather than increased), it would suggest that's what Python indeed was doing under the hood, explaining the performance drag.

    Actually, the runtime fell by 1.5 times! How is it possible? Isn't the worst thing that Python could possibly have to do was these two conversions?

    I don't really know. Perhaps Python had to dynamically check what needs to be converted into what, which takes time, and being told what precise conversions to perform makes it faster. Perhaps, some entirely different mechanism is used for arithmetics (which doesn't involve conversions at all), and it happens to be super-slow on mismatched types. Reading numpy source code might help, but it's beyond my skill.

    Anyway, now we can obviously speed things up more by moving the conversions out of the loop:

    q = np.float64(8)
    r = np.float64(2399232)
    for i in range(10000000):
      s = (s + q) * s % r
    

    As expected, the runtime is reduced substantially: by another 2.3 times.

    To be fair, we now need to change the float version slightly, by moving the literal constants out of the loop. This results in a tiny (10%) slowdown.

    Accounting for all these changes, the np.float64 version of the code is now only 30% slower than the equivalent float version; the ridiculous 5-fold performance hit is largely gone.

    Why do we still see the 30% delay? numpy.float64 numbers take the same amount of space as float, so that won't be the reason. Perhaps the resolution of the arithmetic operators takes longer for user-defined types. Certainly not a major concern.

提交回复
热议问题