问题
The answer of the question this is marked duplicate of is wrong and does not satisfy my needs.
My code aims to calculate a hash from a series of numbers.
It is easier to understand the structure in the form of a matrix. If I have 16 numbers starting from 29 the structure will be: (start=29, length=4)
29, 30, 31, 32,
33, 34, 35, 36,
37, 38, 39, 40,
41, 42, 43, 44
The given algorithm specifies the the hash will be the XOR of the numbers given in bold:
29, 30, 31, 32, //,
33, 34, 35, //, 36,
37, 38, //, 39, 40,
41, //, 42, 43, 44
Hash=29^30^31^32^33^34^35^37^38^39
=54
My code is:
def answer(start, length):
val=0
c=0
for i in range(length):
for j in range(length):
if j < length-i:
val^=start+c
c+=1
return val
The time required to compute for large values like answer(2000000000,10**4)
is way too much.
Constraints:
- Py2.7.6
- Only standard libraries except for bz2, crypt, fcntl, mmap, pwd, pyexpat, select, signal, termios, thread, time, unicodedata, zipimport, zlib.
- Limited time to compute.
Currently computing the test parameters (unknown to me) give me a timeout error.
How can the speed of my code be improved for bigger values?
回答1:
There is a bug in the accepted answer to Python fast XOR over range algorithm: decrementing l
needs to be done before the XOR calculation. Here's a repaired version, along with an assert
test to verify that it gives the same result as the naive algorithm.
def f(a):
return (a, 1, a + 1, 0)[a % 4]
def getXor(a, b):
return f(b) ^ f(a-1)
def gen_nums(start, length):
l = length
ans = 0
while l > 0:
l = l - 1
ans ^= getXor(start, start + l)
start += length
return ans
def answer(start, length):
c = val = 0
for i in xrange(length):
for j in xrange(length - i):
n = start + c + j
#print '%d,' % n,
val ^= n
#print
c += length
return val
for start in xrange(50):
for length in xrange(100):
a = answer(start, length)
b = gen_nums(start, length)
assert a == b, (start, length, a, b)
Over those ranges of start
and length
, gen_nums
is about 5 times faster than answer
, but we can make it roughly twice as fast again (i.e., roughly 10 times as fast as answer
) by eliminating those function calls:
def gen_nums(start, length):
ans = 0
for l in xrange(length - 1, -1, -1):
b = start + l
ans ^= (b, 1, b + 1, 0)[b % 4] ^ (0, start - 1, 1, start, 0)[start % 4]
start += length
return ans
As Mirek Opoka mentions in the comments, % 4
is equivalent to & 3
, and it's faster because bitwise arithmetic is faster than performing integer division and throwing away the quotient. So we can replace the core step with
ans ^= (b, 1, b + 1, 0)[b & 3] ^ (0, start - 1, 1, start, 0)[start & 3]
回答2:
It looks like you can replace the inner loop and if with:
for j in range(length - i)
val^=start+c
c+=1
c+=i
This should save some time when i gets bigger
I'm afraid I can't test this right now, sorry!
回答3:
I am afraid that, with the input you have in answer(2000000000,10**4)
you'll never finish "in time".
You can get a pretty significant speed up by improving the inner loop, not updating the c
variable every time and using xrange
instead of range
, like this:
def answer(start, length):
val=0
c=0
for i in range(length):
for j in range(length):
if j < length-i:
val^=start+c
c+=1
return val
def answer_fast(start, length):
val = 0
c = 0
for i in xrange(length):
for j in xrange(length - i):
if j < length - i:
val ^= start + c + j
c += length
return val
# print answer(10, 20000)
print answer_fast(10, 20000)
The profiler shows that answer_fast
is about twice as fast:
> python -m cProfile script.py
366359392
20004 function calls in 46.696 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 46.696 46.696 script.py:1(<module>)
1 44.357 44.357 46.696 46.696 script.py:1(answer)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
20001 2.339 0.000 2.339 0.000 {range}
> python -m cProfile script.py
366359392
3 function calls in 26.274 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 26.274 26.274 script.py:1(<module>)
1 26.274 26.274 26.274 26.274 script.py:12(answer_fast)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
But if you want major speed ups (orders of magnitute) you should consider rewriting your function in Cython.
Here is the "cythonized" version of it:
def answer(int start, int length):
cdef int val = 0, c = 0, i, j
for i in xrange(length):
for j in xrange(length - i):
if j < length - i:
val ^= start + c + j
c += length
return val
With the same input parameters as above, it takes less than 200ms insted of 20+ seconds, which is a 100x speedup.
> ipython
In [1]: import pyximport; pyximport.install()
Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x7f3fed983150>)
In [2]: import script2
In [3]: timeit script2.answer(10, 20000)
10 loops, best of 3: 188 ms per loop
With your input parameters, it takes 58ms:
In [5]: timeit script2.answer(2000000000,10**4)
10 loops, best of 3: 58.2 ms per loop
来源:https://stackoverflow.com/questions/40378419/speed-up-python2-nested-loops-with-xor