weighted numpy bincount for 2D IDs array and 1D weights

泄露秘密 提交于 2020-07-09 08:39:50

问题


I am using numpy_indexed for applying a vectorized numpy bincount, as follows:

import numpy as np
import numpy_indexed as npi
rowidx, colidx = np.indices(index_tri.shape)
(cols, rows), B = npi.count((index_tri.flatten(), rowidx.flatten()))

where index_tri is the following matrix:

index_tri = np.array([[ 0,  0,  0,  7,  1,  3],
       [ 1,  2,  2,  9,  8,  9],
       [ 3,  1,  1,  4,  9,  1],
       [ 5,  6,  6, 10, 10, 10],
       [ 7,  8,  9,  4,  3,  3],
       [ 3,  8,  6,  3,  8,  6],
       [ 4,  3,  3,  7,  8,  9],
       [10, 10, 10,  5,  6,  6],
       [ 4,  9,  1,  3,  1,  1],
       [ 9,  8,  9,  1,  2,  2]])

Then I map the binned values in the corresponding position of the following initialized matrix m:

m = np.zeros((10,11))
m 
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

m[rows, cols] = B
m
array([[3., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0.],
       [0., 1., 2., 0., 0., 0., 0., 0., 1., 2., 0.],
       [0., 3., 0., 1., 1., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1., 2., 0., 0., 0., 3.],
       [0., 0., 0., 2., 1., 0., 0., 1., 1., 1., 0.],
       [0., 0., 0., 2., 0., 0., 2., 0., 2., 0., 0.],
       [0., 0., 0., 2., 1., 0., 0., 1., 1., 1., 0.],
       [0., 0., 0., 0., 0., 1., 2., 0., 0., 0., 3.],
       [0., 3., 0., 1., 1., 0., 0., 0., 0., 1., 0.],
       [0., 1., 2., 0., 0., 0., 0., 0., 1., 2., 0.]])

However, this considers that the weight of each value in index_tri per column is 1. Now if I have a weights array, providing a corresponding weight value per column in index_tri instead of 1:

weights = np.array([0.7, 0.8, 1.5, 0.6, 0.5, 1.9])

how to apply a weighted bincount so that my output matrix m becomes as follows:

array([[3., 0.5, 0., 1.9, 0., 0., 0., 0.6, 0., 0., 0.],
       [0., 0.7, 2.3, 0., 0., 0., 0., 0., 0.5, 2.5, 0.],
       [0., 4.2, 0., 0.7, 0.6, 0., 0., 0., 0., 0.5, 0.],
       [0., 0., 0., 0., 0., 0.7, 2.3, 0., 0., 0., 3.],
       [0., 0., 0., 2.4, 0.6, 0., 0., 0.7, 0.8, 1.5, 0.],
       [0., 0., 0., 2.3, 0., 0., 2.4, 0., 1.3, 0., 0.],
       [0., 0., 0., 2.3, 0.7, 0., 0., 0.6, 0.5, 1.9, 0.],
       [0., 0., 0., 0., 0., 0.6, 2.4, 0., 0., 0., 3.],
       [0., 3.9, 0., 0.6, 0.7, 0., 0., 0., 0., 0.8, 0.],
       [0., 0.6, 2.4, 0., 0., 0., 0., 0., 0.8, 2.2, 0.]])

any idea?


By using a for loop and the numpy bincount() I could solve it as follows:

for i in range(m.shape[0]):
   m[i, :] = np.bincount(index_tri[i, :], weights=weights, minlength=m.shape[1])

I am trying to adapt the vectorized provided solution from here and here respectively but I cannot figure out what the ix2D variable corresponds to in the first link. Could someone elaborate a bit if possible.


Update (solution):

Based on the @Divakar's solution below, here is an updated version where it takes an extra input parameter in case that your indices input matrix does not cover the full range of the output initialized matrix:

    def bincount2D(id_ar_2D, weights_1D, sz=None):
        # Inputs : 2D id array, 1D weights array

        # Extent of bins per col
        if sz == None:
            n = id_ar_2D.max() + 1
            N = len(id_ar_2D)
        else:
            n = sz[1]
            N = sz[0]

        # add offsets to the original values to be used when we apply raveling later on
        id_ar_2D_offsetted = id_ar_2D + n * np.arange(N)[:, None]

        # Finally use bincount with those 2D bins as flattened and with
        # flattened b as weights. Reshaping is needed to add back into "a".
        ids = id_ar_2D_offsetted.ravel()
        W = np.tile(weights_1D, N)
        return np.bincount(ids, W, minlength=n * N).reshape(-1, n)

回答1:


Inspired by this post -

def bincount2D(id_ar_2D, weights_1D):
    # Inputs : 2D id array, 1D weights array
    
    # Extent of bins per col
    n = id_ar_2D.max()+1
    
    N = len(id_ar_2D)
    id_ar_2D_offsetted = id_ar_2D + n*np.arange(N)[:,None]
    
    # Finally use bincount with those 2D bins as flattened and with
    # flattened b as weights. Reshaping is needed to add back into "a".
    ids = id_ar_2D_offsetted.ravel()
    W = np.tile(weights_1D,N)
    return np.bincount(ids, W, minlength=n*N).reshape(-1,n)

out = bincount2D(index_tri, weights)


来源:https://stackoverflow.com/questions/62719951/weighted-numpy-bincount-for-2d-ids-array-and-1d-weights

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!