Speed up Python2 nested loops with XOR

后端 未结 3 2121
死守一世寂寞
死守一世寂寞 2020-12-22 12:44

The answer of the question this is marked duplicate of is wrong and does not satisfy my needs.

My code aims to calculate a hash from a seri

3条回答
  •  盖世英雄少女心
    2020-12-22 13:35

    I am afraid that, with the input you have in answer(2000000000,10**4) you'll never finish "in time".

    You can get a pretty significant speed up by improving the inner loop, not updating the c variable every time and using xrange instead of range, like this:

    def answer(start, length):
        val=0
        c=0
        for i in range(length):
            for j in range(length):
                if j < length-i:
                    val^=start+c
                c+=1
        return val
    
    
    def answer_fast(start, length):
        val = 0
        c = 0
        for i in xrange(length):
            for j in xrange(length - i):
                if j < length - i:
                    val ^= start + c + j
            c += length
        return val
    
    
    # print answer(10, 20000)
    print answer_fast(10, 20000)
    

    The profiler shows that answer_fast is about twice as fast:

    > python -m cProfile script.py
    366359392
            20004 function calls in 46.696 seconds
    
    Ordered by: standard name
    
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000   46.696   46.696 script.py:1()
            1   44.357   44.357   46.696   46.696 script.py:1(answer)
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        20001    2.339    0.000    2.339    0.000 {range}
    
    > python -m cProfile script.py
    366359392
            3 function calls in 26.274 seconds
    
    Ordered by: standard name
    
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000   26.274   26.274 script.py:1()
            1   26.274   26.274   26.274   26.274 script.py:12(answer_fast)
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    

    But if you want major speed ups (orders of magnitute) you should consider rewriting your function in Cython.

    Here is the "cythonized" version of it:

    def answer(int start, int length):
        cdef int val = 0, c = 0, i, j
        for i in xrange(length):
            for j in xrange(length - i):
                if j < length - i:
                    val ^= start + c + j
            c += length
        return val
    

    With the same input parameters as above, it takes less than 200ms insted of 20+ seconds, which is a 100x speedup.

    > ipython
    
    In [1]: import pyximport; pyximport.install()
    Out[1]: (None, )
    
    In [2]: import script2
    
    In [3]: timeit script2.answer(10, 20000)
    10 loops, best of 3: 188 ms per loop
    

    With your input parameters, it takes 58ms:

    In [5]: timeit script2.answer(2000000000,10**4)
    10 loops, best of 3: 58.2 ms per loop
    

提交回复
热议问题