Speed up Python2 nested loops with XOR

浪尽此生 提交于 2019-11-28 14:02:32

问题


The answer of the question this is marked duplicate of is wrong and does not satisfy my needs.

My code aims to calculate a hash from a series of numbers.

It is easier to understand the structure in the form of a matrix. If I have 16 numbers starting from 29 the structure will be: (start=29, length=4)

29, 30, 31, 32,
33, 34, 35, 36,
37, 38, 39, 40,
41, 42, 43, 44

The given algorithm specifies the the hash will be the XOR of the numbers given in bold:

29, 30, 31, 32, //,
33, 34, 35, //, 36,
37, 38, //, 39, 40,
41, //, 42, 43, 44

Hash=29^30^31^32^33^34^35^37^38^39=54


My code is:

def answer(start, length):
    val=0
    c=0
    for i in range(length):
        for j in range(length):
            if j < length-i:
                val^=start+c
            c+=1
    return val

The time required to compute for large values like answer(2000000000,10**4) is way too much.


Constraints:

  • Py2.7.6
  • Only standard libraries except for bz2, crypt, fcntl, mmap, pwd, pyexpat, select, signal, termios, thread, time, unicodedata, zipimport, zlib.
  • Limited time to compute.

Currently computing the test parameters (unknown to me) give me a timeout error.


How can the speed of my code be improved for bigger values?


回答1:


There is a bug in the accepted answer to Python fast XOR over range algorithm: decrementing l needs to be done before the XOR calculation. Here's a repaired version, along with an assert test to verify that it gives the same result as the naive algorithm.

def f(a):
    return (a, 1, a + 1, 0)[a % 4]

def getXor(a, b):
    return f(b) ^ f(a-1)

def gen_nums(start, length):
    l = length
    ans = 0
    while l > 0:
        l = l - 1
        ans ^= getXor(start, start + l)
        start += length
    return ans

def answer(start, length):
    c = val = 0
    for i in xrange(length):
        for j in xrange(length - i):
            n = start + c + j
            #print '%d,' % n,
            val ^= n
        #print
        c += length
    return val

for start in xrange(50):
    for length in xrange(100):
        a = answer(start, length)
        b = gen_nums(start, length)
        assert a == b, (start, length, a, b)

Over those ranges of start and length, gen_nums is about 5 times faster than answer, but we can make it roughly twice as fast again (i.e., roughly 10 times as fast as answer) by eliminating those function calls:

def gen_nums(start, length):
    ans = 0
    for l in xrange(length - 1, -1, -1):
        b = start + l
        ans ^= (b, 1, b + 1, 0)[b % 4] ^ (0, start - 1, 1, start, 0)[start % 4]
        start += length
    return ans

As Mirek Opoka mentions in the comments, % 4 is equivalent to & 3, and it's faster because bitwise arithmetic is faster than performing integer division and throwing away the quotient. So we can replace the core step with

ans ^= (b, 1, b + 1, 0)[b & 3] ^ (0, start - 1, 1, start, 0)[start & 3]



回答2:


It looks like you can replace the inner loop and if with:

for j in range(length - i) val^=start+c c+=1 c+=i This should save some time when i gets bigger

I'm afraid I can't test this right now, sorry!




回答3:


I am afraid that, with the input you have in answer(2000000000,10**4) you'll never finish "in time".

You can get a pretty significant speed up by improving the inner loop, not updating the c variable every time and using xrange instead of range, like this:

def answer(start, length):
    val=0
    c=0
    for i in range(length):
        for j in range(length):
            if j < length-i:
                val^=start+c
            c+=1
    return val


def answer_fast(start, length):
    val = 0
    c = 0
    for i in xrange(length):
        for j in xrange(length - i):
            if j < length - i:
                val ^= start + c + j
        c += length
    return val


# print answer(10, 20000)
print answer_fast(10, 20000)

The profiler shows that answer_fast is about twice as fast:

> python -m cProfile script.py
366359392
        20004 function calls in 46.696 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   46.696   46.696 script.py:1(<module>)
        1   44.357   44.357   46.696   46.696 script.py:1(answer)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    20001    2.339    0.000    2.339    0.000 {range}

> python -m cProfile script.py
366359392
        3 function calls in 26.274 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   26.274   26.274 script.py:1(<module>)
        1   26.274   26.274   26.274   26.274 script.py:12(answer_fast)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

But if you want major speed ups (orders of magnitute) you should consider rewriting your function in Cython.

Here is the "cythonized" version of it:

def answer(int start, int length):
    cdef int val = 0, c = 0, i, j
    for i in xrange(length):
        for j in xrange(length - i):
            if j < length - i:
                val ^= start + c + j
        c += length
    return val

With the same input parameters as above, it takes less than 200ms insted of 20+ seconds, which is a 100x speedup.

> ipython

In [1]: import pyximport; pyximport.install()
Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x7f3fed983150>)

In [2]: import script2

In [3]: timeit script2.answer(10, 20000)
10 loops, best of 3: 188 ms per loop

With your input parameters, it takes 58ms:

In [5]: timeit script2.answer(2000000000,10**4)
10 loops, best of 3: 58.2 ms per loop


来源:https://stackoverflow.com/questions/40378419/speed-up-python2-nested-loops-with-xor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!