Fast combinations without replacement for arrays - NumPy / Python

后端 未结 1 1008
轮回少年
轮回少年 2020-12-20 03:05

I\'m after generating efficiently pairwise combinations from 1D array(s). Itertools is just too inefficient with if n > 1000

E.g. [1, 2, 3, 4]

magic code...         


        
相关标签:
1条回答
  • 2020-12-20 03:53

    I. Pairwise-combinations

    One way would be with numba to gain memory and hence performance efficiency -

    from numba import njit
    
    @njit
    def pairwise_combs_numba(a):
        n = len(a)
        L = n*(n-1)//2
        out = np.empty((L,2),dtype=a.dtype)
        iterID = 0
        for i in range(n):
            for j in range(i+1,n):
                out[iterID,0] = a[i]
                out[iterID,1] = a[j]
                iterID += 1
        return out
    

    Another NumPy based one would be using np.broadcast_to to get meshgrid views and then masking -

    def pairwise_combs_mask(a):
        n = len(a)
        L = n*(n-1)//2
        out = np.empty((L,2),dtype=a.dtype)
        m = ~np.tri(len(a),dtype=bool)
        out[:,0] = np.broadcast_to(a[:,None],(n,n))[m]
        out[:,1] = np.broadcast_to(a,(n,n))[m]
        return out
    

    II. Triplet-combinations

    We will extend the same methodology to get ourselves triplet-combinations -

    @njit
    def triplet_combs_numba(a):
        n = len(a)
        L = n*(n-1)*(n-2)//6
        out = np.empty((L,3),dtype=a.dtype)
        iterID = 0
        for i in range(n):
            for j in range(i+1,n):
                for k in range(j+1,n):
                    out[iterID,0] = a[i]
                    out[iterID,1] = a[j]
                    out[iterID,2] = a[k]
                    iterID += 1
        return out
    
    def triplet_combs_mask(a):
        n = len(a)
        L = n*(n-1)*(n-2)//6
        out = np.empty((L,3),dtype=a.dtype)
        r = np.arange(n)
        m = (r[:,None,None]<r[:,None]) & (r[:,None]<r)
        out[:,0] = np.broadcast_to(a[:,None,None],(n,n,n))[m]
        out[:,1] = np.broadcast_to(a[None,:,None],(n,n,n))[m]
        out[:,2] = np.broadcast_to(a[None,None,:],(n,n,n))[m]
        return out
    

    Combinations for higher orders would extend like-wise.

    Sample runs -

    In [54]: a = np.array([3,9,4,1,7])
    
    In [55]: pairwise_combs_numba(a)
    Out[55]: 
    array([[3, 9],
           [3, 4],
           [3, 1],
           [3, 7],
           [9, 4],
           [9, 1],
           [9, 7],
           [4, 1],
           [4, 7],
           [1, 7]])
    
    In [56]: triplet_combs_numba(a)
    Out[56]: 
    array([[3, 9, 4],
           [3, 9, 1],
           [3, 9, 7],
           [3, 4, 1],
           [3, 4, 7],
           [3, 1, 7],
           [9, 4, 1],
           [9, 4, 7],
           [9, 1, 7],
           [4, 1, 7]])
    

    Timings (including Python's builtin - itertools.combinations) -

    In [68]: a = np.random.rand(4000)
    
    In [69]: %timeit pairwise_combs_numba(a)
        ...: %timeit pairwise_combs_mask(a)
        ...: %timeit list(itertools.combinations(a, 2))
    10 loops, best of 3: 52.2 ms per loop
    10 loops, best of 3: 146 ms per loop
    1 loop, best of 3: 597 ms per loop
    
    In [70]: a = np.random.rand(400)
    
    In [71]: %timeit triplet_combs_numba(a)
        ...: %timeit triplet_combs_mask(a)
        ...: %timeit list(itertools.combinations(a, 3))
    10 loops, best of 3: 98.5 ms per loop
    1 loop, best of 3: 352 ms per loop
    1 loop, best of 3: 795 ms per loop
    
    0 讨论(0)
提交回复
热议问题