问题
It is my understanding that the itertools functions are written in C. If i wanted to speed this example code up:
import numpy as np
from itertools import combinations_with_replacement
def combinatorics(LargeArray):
newArray = np.empty((LargeArray.shape[0],LargeArray.shape[0]))
for x, y in combinations_with_replacement(xrange(LargeArray.shape[0]), r=2):
z = LargeArray[x] + LargeArray[y]
newArray[x, y] = z
return newArray
Since combinations_with_replacement
is written in C, does that imply that it can't be sped up? Please advise.
Thanks in advance.
回答1:
It's true that combinations_with_replacement
is written in C, which means that you're not likely to speed up the implementation of that part of the code. But most of your code isn't spent on finding the combinations: it's on the for
loop that does the additions. You really, really, really want to avoid that kind of loop if at all possible when you're using numpy. This version will do almost the same thing, through the magic of broadcasting:
def sums(large_array):
return large_array.reshape((-1, 1)) + large_array.reshape((1, -1))
For example:
>>> ary = np.arange(5).astype(float)
>>> np.triu(combinatorics(ary))
array([[ 0., 1., 2., 3., 4.],
[ 0., 2., 3., 4., 5.],
[ 0., 0., 4., 5., 6.],
[ 0., 0., 0., 6., 7.],
[ 0., 0., 0., 0., 8.]])
>>> np.triu(sums(ary))
array([[ 0., 1., 2., 3., 4.],
[ 0., 2., 3., 4., 5.],
[ 0., 0., 4., 5., 6.],
[ 0., 0., 0., 6., 7.],
[ 0., 0., 0., 0., 8.]])
The difference is that combinatorics
leaves the lower triangle as random gibberish, where sums
makes the matrix symmetric. If you really wanted to avoid adding everything twice, you probably could, but I can't think of how to do it off the top of my head.
Oh, and the other difference:
>>> big_ary = np.random.random(1000)
>>> %timeit combinatorics(big_ary)
1 loops, best of 3: 482 ms per loop
>>> %timeit sums(big_ary)
1000 loops, best of 3: 1.7 ms per loop
来源:https://stackoverflow.com/questions/14472362/numpy-with-combinatoric-generators-how-does-one-speed-up-combinations