Python Alternative to itertools product with numpy

两盒软妹~` 提交于 2020-03-25 17:38:24

问题


I am using a list of list with varying sizes. For example alternativesList can include 4 lists in one iteration and 7 lists in the other.

What i am trying to do is capture every combination of words in different lists.

Lets say that

a= [1,2,3]
alternativesList.append(a)
b = ["a","b","c"]
alternativesList.append(b)

productList = itertools.product(*alternativesList)

will create

[(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'b'), (2, 'c'), (3, 'a'), (3, 'b'), (3, 'c')]

One problem here is that my productList can be so large it can cause memory problems. So i am using productList as object and iterate over it later.

What i want to know is that is there a way to create same object with numpy which works faster than itertools?


回答1:


Generally speaking, if we consider the optimization as a balance scale memory and runtime would be its two Weighing dishes. This is to say that memory optimization and runtime optimization have an indirect relation together (not always but most of the times). Now, regarding your question:

Is there a way to create same object with numpy which works faster than itertools?

Definitely there are, but another point that you need to notice is that abstraction will give you a much more flexibility and that's what itertools.product gives you and Numpy don't. If the scalability is not an important facto in this case you can do this with Numpy and don't give up any benefits. Here is one way using column_stack, repeat and tile functions:

In [5]: np.column_stack((np.repeat(a, b.size),np.tile(b, a.size)))
Out[5]: 
array([['1', 'a'],
       ['1', 'b'],
       ['1', 'c'],
       ['2', 'a'],
       ['2', 'b'],
       ['2', 'c'],
       ['3', 'a'],
       ['3', 'b'],
       ['3', 'c']], dtype='<U21')

Now, still there are some ways to make this array to occupies less memory by using lighter types like U2, U1, etc.

In [10]: np.column_stack((np.repeat(a, b.size),np.tile(b, a.size))).astype('U1')
Out[10]: 
array([['1', 'a'],
       ['1', 'b'],
       ['1', 'c'],
       ['2', 'a'],
       ['2', 'b'],
       ['2', 'c'],
       ['3', 'a'],
       ['3', 'b'],
       ['3', 'c']], dtype='<U1') 



回答2:


You can avoid some problems arising from numpy trying to find catchall dtype by explicitly specifying a compound dtype:

Code + some timings:

import numpy as np
import itertools

def cartesian_product_mixed_type(*arrays):
    arrays = *map(np.asanyarray, arrays),
    dtype = np.dtype([(f'f{i}', a.dtype) for i, a in enumerate(arrays)])
    out = np.empty((*map(len, arrays),), dtype)
    idx = slice(None), *itertools.repeat(None, len(arrays) - 1)
    for i, a in enumerate(arrays):
        out[f'f{i}'] = a[idx[:len(arrays) - i]]
    return out.ravel()

a = np.arange(4)
b = np.arange(*map(ord, ('A', 'D')), dtype=np.int32).view('U1')
c = np.arange(2.)

np.set_printoptions(threshold=10)

print(f'a={a}')
print(f'b={b}')
print(f'c={c}')

print('itertools')
print(list(itertools.product(a,b,c)))
print('numpy')
print(cartesian_product_mixed_type(a,b,c))

a = np.arange(100)
b = np.arange(*map(ord, ('A', 'z')), dtype=np.int32).view('U1')
c = np.arange(20.)

import timeit
kwds = dict(globals=globals(), number=1000)

print()
print(f'a={a}')
print(f'b={b}')
print(f'c={c}')

print(f"itertools: {timeit.timeit('list(itertools.product(a,b,c))', **kwds):7.4f} ms")
print(f"numpy:     {timeit.timeit('cartesian_product_mixed_type(a,b,c)', **kwds):7.4f} ms")

a = np.arange(1000)
b = np.arange(1000, dtype=np.int32).view('U1')

print()
print(f'a={a}')
print(f'b={b}')

print(f"itertools: {timeit.timeit('list(itertools.product(a,b))', **kwds):7.4f} ms")
print(f"numpy:     {timeit.timeit('cartesian_product_mixed_type(a,b)', **kwds):7.4f} ms")

Sample output:

a=[0 1 2 3]
b=['A' 'B' 'C']
c=[0. 1.]
itertools
[(0, 'A', 0.0), (0, 'A', 1.0), (0, 'B', 0.0), (0, 'B', 1.0), (0, 'C', 0.0), (0, 'C', 1.0), (1, 'A', 0.0), (1, 'A', 1.0), (1, 'B', 0.0), (1, 'B', 1.0), (1, 'C', 0.0), (1, 'C', 1.0), (2, 'A', 0.0), (2, 'A', 1.0), (2, 'B', 0.0), (2, 'B', 1.0), (2, 'C', 0.0), (2, 'C', 1.0), (3, 'A', 0.0), (3, 'A', 1.0), (3, 'B', 0.0), (3, 'B', 1.0), (3, 'C', 0.0), (3, 'C', 1.0)]
numpy
[(0, 'A', 0.) (0, 'A', 1.) (0, 'B', 0.) ... (3, 'B', 1.) (3, 'C', 0.)
 (3, 'C', 1.)]

a=[ 0  1  2 ... 97 98 99]
b=['A' 'B' 'C' ... 'w' 'x' 'y']
c=[ 0.  1.  2. ... 17. 18. 19.]
itertools:  7.4339 ms
numpy:      1.5701 ms

a=[  0   1   2 ... 997 998 999]
b=['' '\x01' '\x02' ... 'ϥ' 'Ϧ' 'ϧ']
itertools: 62.6357 ms
numpy:      8.0249 ms


来源:https://stackoverflow.com/questions/49475586/python-alternative-to-itertools-product-with-numpy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!