How to have vectorize calculation between a 1D and 2D numpy array with if conditions

问题

I have a calculation using a 1D and a 2D numpy array. It has two levels of if-conditions. I was able to use np.where to avoid one if-statement and further use the slow list comprehension to iterate through each row.

Ideally, I would like to vectorize the whole calculation process. Is it possible?

Here is my code:

import numpy as np

r_base = np.linspace(0, 4, 5)
np.random.seed(0)
r_mat = np.array([r_base * np.random.uniform(0.9, 1.1, 5),
                  r_base * np.random.uniform(0.9, 1.1, 5),
                  r_base * np.random.uniform(0.9, 1.1, 5)])

a_array = np.linspace(1, 3, 3)

def func_vetorized_level1(r_row, a):
    if r_row.mean() > 2:
        result = np.where((r_row >= a), r_row - a, np.nan)
    else:
        result = np.where((r_row >= a), r_row + a, 0)
    return result
# try to broadcast this func to every row of r_mat using list comprehension
res_mat = np.array([func_vetorized_level1(this_r_row, this_a) 
                    for this_r_row, this_a in zip(r_mat, a_array)])

result is

res_mat =
array([[       nan, 0.04303787, 1.04110535, 2.02692991, 2.93892384],
       [       nan,        nan, 0.1567092 , 1.27819766, 1.90675322],
       [0.        , 0.        , 0.        , 6.25535798, 6.65682885]])

回答1:

Your code is more vectorizable than you think. In addition to vectoring it, you can use the existing functions more appropriately.

To generate an integer range, np.arange works better than np.linspace:

r_base = np.arange(5.)
a_array = np.arange(1., 4.)

The random numbers can be made in a single call with one multiply:

np.random.seed(0)
r_mat = r_base * np.random.uniform(0.9, 1.1, (3, 5))

I think the simplest thing to do would be to make an output array and fill it based on the different conditions:

out = np.empty_like(r_mat)

It would be helpful to make a_array into a column that matches the number of rows in r_mat:

a = a_array[:, None]

The next thing you is to make masks for the conditions. The first is a row-wise mask for r_row.mean() > 2. The second is the element-wise r_row >= a condition:

row_mask = (r_mat.mean(axis=1) > 2)[:, None]
elem_mask = r_mat >= a

The index [:, None] on row_mask makes it into a column vector for broadcasting purposes. Now you can evaluate the selections using direct masking and the where keyword to the appropriate ufuncs:

np.subtract(r_mat, a, out=out, where=row_mask & elem_mask)
np.add(r_mat, a, out=out, where=~row_mask & elem_mask)
out[row_mask & ~elem_mask] = np.nan
out[~row_mask & ~elem_mask] = 0

来源：https://stackoverflow.com/questions/63277363/how-to-have-vectorize-calculation-between-a-1d-and-2d-numpy-array-with-if-condit

标签

python

numpy

conditional-statements

vectorization