Counting Algorithm Performance Optimization in Pypy vs Python (Numpy vs List)

半世苍凉 提交于 2019-12-22 06:35:34

问题


My expectation was that pypy could be as much as an order of magnitude faster than python, but the results indicate that pypy is in fact slower than expected.

I have two questions:

  1. Why is pypy significantly slower with numpy?
  2. Is there anything I can do to optimize my algorithm to make pypy (or python) faster?

Resulting timings:

Python 2.7.5

  • # points: 16,777,216 (8 ** 3 * 32 ** 3)
  • Xrange time: 1487.15 ms
  • Xrange Numpy time: 2553.98 ms
  • Point Gen Time: 6162.23 ms
  • Numpy Gen Time: 13894.73 ms

Pypy 2.2.1

  • # points: 16,777,216 (8 ** 3 * 32 ** 3)
  • Xrange time: 129.48 ms
  • Xrange Numpy time: 4644.12 ms
  • Point Gen Time: 4643.82 ms
  • Numpy Gen Time: 44168.98 ms

Algorithm:

I am generating a list of points in space using a simple algorithm and I'm trying to optimize the algorithm.

def generate(size=32, point=(0, 0, 0), width=32):
    """
    generate points in space around a center point with a specific width and
      number of divisions (size)
    """
    X, Y, Z = point
    half = width * 0.5
    delta = width
    scale = width / size
    offset = scale * 0.5
    X = X + offset - half
    Y = Y + offset - half
    Z = Z + offset - half
    for x in xrange(size):
        x = (x * scale) + X
        for y in xrange(size):
            y = (y * scale) + Y
            for z in xrange(size):
                z = (z * scale) + Z
                yield (x, y, z)

In optimizing, I started looking at using pypy rather than python. And in comparing the two, I came up with a few different scenarios:

  • counting using xrange

    rsize = 8    # size of region
    csize = 32   # size of chunk
    number_of_points = rsize ** 3 * csize ** 3
    [x for x in xrange(number_of_points)]
    
  • counting using xrange with numpy

    rsize = 8    # size of region
    csize = 32   # size of chunk
    number_of_points = rsize ** 3 * csize ** 3
    np.array([x for x in xrange(number_of_points)])
    
  • running with the algorithm above

    rsize = 8    # size of region
    csize = 32   # size of chunk
    [p
     for rp in generate(size=rsize, width=rsize*csize)
     for p in generate(size=csize, width=csize, point=rp)]
    
  • running the algorithm above with numpy

    rsize = 8    # size of region
    csize = 32   # size of chunk
    np.array([p
     for rp in generate(size=rsize, width=rsize*csize)
     for p in generate(size=csize, width=csize, point=rp)])
    

Background:

I am attempting to create a voxel engine, and I want to optimize my algorithm to bring down generation times to a manageable level. While I clearly won't achieve anything close to Java/C++, I'd like to push python (or pypy) as far as I can.

I noticed that list lookups are significantly faster than dictionaries from some earlier timings. And lists are also faster than tuples (unexpected), though tuples generate faster. Numpy has even faster read times than non-numpy. However, numpy creation time can be orders of magnitude slower.

So if reading is most important, there is a clear advantage for using a numpy. If reading and creation are of equal importance, however, then a straight list is likely best. That said, I don't have a clean way of looking at memory usage, but I suspect that lists are far less memory efficient than tuples or numpy. Also, while it's a small difference, I found .get on a dictionary to be slightly faster than using the __ getitem __ call (i.e. dictionary[lookup] vs. dicitonary.get(lookup) )

timings ...

Python 2.7.5

Reading

 - Option 1: tuple access... 2045.51 ms
 - Option 2: tuple access (again)... 2081.97 ms    # sampling effect of cache
 - Option 3: list access... 2072.09 ms
 - Option 4: dict access... 3436.53 ms
 - Option 5: iterable creation... N/A
 - Option 6: numpy array... 1752.44 ms

Creation

 - Option 1: tuple creation... 690.36 ms
 - Option 2: tuple creation (again)... 716.49 ms    # sampling effect of cache
 - Option 3: list creation... 684.28 ms
 - Option 4: dict creation... 1498.94 ms
 - Option 5: iterable creation...  0.01 ms
 - Option 6: numpy creation... 3514.25 ms

Pypy 2.2.1

Reading

 - Option 1: tuple access... 243.34 ms
 - Option 2: tuple access (again)... 246.51 ms    # sampling effect of cache
 - Option 3: list access... 139.65 ms
 - Option 4: dict access... 454.65 ms
 - Option 5: iterable creation... N/A
 - Option 6: numpy array... 21.60 ms

Creation

 - Option 1: tuple creation... 1016.27 ms
 - Option 2: tuple creation (again)... 1063.50 ms    # sampling effect of cache
 - Option 3: list creation... 365.98 ms
 - Option 4: dict creation... 2258.44 ms
 - Option 5: iterable creation...  0.00 ms
 - Option 6: numpy creation... 12514.20 ms

In all the examples, a random lookup was generated for random data.

dsize = 10 ** 7   # or 10 million data points
data = [(i, random.random()*dsize)
         for i in range(dsize)]
lookup = tuple(int(random.random()*dsize) for i in range(dsize))

The loops were very simple:

for x in lookup:
    data_of_specific_type[x]

And the data_of_specific_type was a conversion of data into that type (e.g. tuple(data), list(data), etc.)


回答1:


Part of the issue is that:

np.array([p
    for rp in generate(size=rsize, width=rsize*csize)
    for p in generate(size=csize, width=csize, point=rp)])

Does all the work of creating the list and converting it to an np.array.

A faster way to do this would be:

arr = np.empty(size)
i = 0
for rp in generate(size=rsize, width=rsize*csize):
    for p in generate(size=csize, width=csize, point=rp):
        arr[i] = p
        i += 1


来源:https://stackoverflow.com/questions/23818691/counting-algorithm-performance-optimization-in-pypy-vs-python-numpy-vs-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!