Numpy arrays: row/column wise argmax with random ties

梦想与她 提交于 2019-11-29 12:27:11

Generic case solution to pick one per group

To solve a general case of picking a random number from a list/array of numbers that specify the ranges for the picks, we would use a trick of creating a uniform rand array, add offset specified by the interval lengths and then perform argsort. The implementation would look something like this -

def random_num_per_grp(L):
    # For each element in L pick a random number within range specified by it
    r1 = np.random.rand(np.sum(L)) + np.repeat(np.arange(len(L)),L)
    offset = np.r_[0,np.cumsum(L[:-1])]
    return r1.argsort()[offset] - offset

Sample case -

In [217]: L = [5,4,2]

In [218]: random_num_per_grp(L) # i.e. select one per [0-5,0-4,0-2]
Out[218]: array([2, 0, 1])

So, the output would have same number of elements as in input L and the first output element would be in [0,5), second in [0,4) and so on.


Solving our problem here

To solve our case here, we would use a modified version (specifically remove the offset removal part at the end of the func, like so -

def random_num_per_grp_cumsumed(L):
    # For each element in L pick a random number within range specified by it
    # The final output would be a cumsumed one for use with indexing, etc.
    r1 = np.random.rand(np.sum(L)) + np.repeat(np.arange(len(L)),L)
    offset = np.r_[0,np.cumsum(L[:-1])]
    return r1.argsort()[offset] 

Approach #1

One solution could use it like so -

def argmax_per_row_randtie(a):
    max_mask = a==a.max(1,keepdims=1)
    m,n = a.shape
    all_argmax_idx = np.flatnonzero(max_mask)
    offset = np.arange(m)*n
    return all_argmax_idx[random_num_per_grp_cumsumed(max_mask.sum(1))] - offset

Verification

Let's test out on the given sample with a huge number of runs and count number of occurences for each index for each row

In [235]: a
Out[235]: 
array([[1, 3, 3],
       [4, 5, 6],
       [7, 8, 1]])

In [225]: all_out = np.array([argmax_per_row_randtie(a) for i in range(10000)])

# The first element (row=0) should have similar probabilities for 1 and 2
In [236]: (all_out[:,0]==1).mean()
Out[236]: 0.504

In [237]: (all_out[:,0]==2).mean()
Out[237]: 0.496

# The second element (row=1) should only have 2
In [238]: (all_out[:,1]==2).mean()
Out[238]: 1.0

# The third element (row=2) should only have 1
In [239]: (all_out[:,2]==1).mean()
Out[239]: 1.0

Approach #2 : Use masking for performance

We could make use of masking and hence avoid that flatnonzero with the intention of gaining performance as working with boolean arrays generally is. Also, we would generalize to cover both rows (axis=1) and columns(axis=0) to give ourselves a modified one, like so -

def argmax_randtie_masking_generic(a, axis=1): 
    max_mask = a==a.max(axis=axis,keepdims=True)
    m,n = a.shape
    L = max_mask.sum(axis=axis)
    set_mask = np.zeros(L.sum(), dtype=bool)
    select_idx = random_num_per_grp_cumsumed(L)
    set_mask[select_idx] = True
    if axis==0:
        max_mask.T[max_mask.T] = set_mask
    else:
        max_mask[max_mask] = set_mask
    return max_mask.argmax(axis=axis) 

Sample runs on axis=0 and axis=1 -

In [423]: a
Out[423]: 
array([[1, 3, 3],
       [4, 5, 6],
       [7, 8, 1]])
In [424]: argmax_randtie_masking_generic(a, axis=1)
Out[424]: array([1, 2, 1])

In [425]: argmax_randtie_masking_generic(a, axis=1)
Out[425]: array([2, 2, 1])

In [426]: a[1,1] = 8

In [427]: a
Out[427]: 
array([[1, 3, 3],
       [4, 8, 6],
       [7, 8, 1]])

In [428]: argmax_randtie_masking_generic(a, axis=0)
Out[428]: array([2, 1, 1])

In [429]: argmax_randtie_masking_generic(a, axis=0)
Out[429]: array([2, 1, 1])

In [430]: argmax_randtie_masking_generic(a, axis=0)
Out[430]: array([2, 2, 1])

A simple way is to add a small random number to all the values at the start, so your data would be like this:

a = np.array([[1.1827,3.1734,3.9187],[4.8172,5.7101,6.9182],[7.1834,8.5012,1.9818]])

That can be done by a = a + np.random.random(a.shape).

If you later need to get the original values back, you can do a.astype(int) to drop the fractional parts.

You could use an array of random numbers, the same shape as your input, but mask out the array to only leave the candidates for selection.

import numpy as np

def rndArgMax(a, axis):
    a_max = a.max(axis, keepdims=True)
    tmp = np.random.random(a.shape) * (a == a_max)
    return tmp.argmax(axis)

a = np.random.randint(0, 3, size=(2, 3, 4))
print(rndArgMax(a, 1))
# array([[1, 1, 2, 1],
#        [0, 1, 1, 1]])
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!