问题
From an array like db (which will be approximately (1e6, 300)) and a mask = [1, 0, 1] vector, I define the target as a 1 in the first column.
I want to create an out vector that consists of ones where the corresponding row in db matches the mask and target==1, and zeros everywhere else.
db = np.array([ # out for mask = [1, 0, 1]
# target, vector #
[1, 1, 0, 1], # 1
[0, 1, 1, 1], # 0 (fit to mask but target == 0)
[0, 0, 1, 0], # 0
[1, 1, 0, 1], # 1
[0, 1, 1, 0], # 0
[1, 0, 0, 0], # 0
])
I have defined a vline function that applies a mask to each array line using np.array_equal(mask, mask & vector) to check that vectors 101 and 111 fit the mask, then retains only the indices where target == 1.
out is initialized to array([0, 0, 0, 0, 0, 0])
out = [0, 0, 0, 0, 0, 0]
The vline function is defined as:
def vline(idx, mask):
line = db[idx]
target, vector = line[0], line[1:]
if np.array_equal(mask, mask & vector):
if target == 1:
out[idx] = 1
I get the correct result by applying this function line-by-line in a for loop:
def check_mask(db, out, mask=[1, 0, 1]):
# idx_db to iterate over db lines without enumerate
for idx in np.arange(db.shape[0]):
vline(idx, mask=mask)
return out
assert check_mask(db, out, [1, 0, 1]) == [1, 0, 0, 1, 0, 0] # it works !
Now I want to vectorize vline by creating a ufunc:
ufunc_vline = np.frompyfunc(vline, 2, 1)
out = [0, 0, 0, 0, 0, 0]
ufunc_vline(db, [1, 0, 1])
print out
But the ufunc complains about broadcasting inputs with those shapes:
In [217]: ufunc_vline(db, [1, 0, 1])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-217-9008ebeb6aa1> in <module>()
----> 1 ufunc_vline(db, [1, 0, 1])
ValueError: operands could not be broadcast together with shapes (6,4) (3,)
In [218]:
回答1:
Converting vline to a numpy ufunc fundamentally doesn't make sense, since ufuncs are always applied to numpy arrays in an elementwise fashion. Because of this, the input arguments must either have the same shape, or must be broadcastable to the same shape. You are passing two arrays with incompatible shapes to your ufunc_vline function (db.shape == (6, 4) and mask.shape == (3,)), hence the ValueError you are seeing.
There are a couple of other issues with ufunc_vline:
np.frompyfunc(vline, 2, 1)specifies thatvlineshould return a single output argument, whereasvlineactually returns nothing (but modifiesoutin place).You are passing
dbas the first argument toufunc_vline, whereasvlineexpects the first argument to beidx, which is used as an index into the rows ofdb.
Also, bear in mind that creating a ufunc from a Python function using np.frompyfunc will not yield any noticeable performance benefit over a standard Python for loop. To see any serious improvement you would probably need to code the ufunc in a low-level language such as C (see this example in the documentation).
Having said that, your vline function can be easily vectorized using standard boolean array operations:
def vline_vectorized(db, mask):
return db[:, 0] & np.all((mask & db[:, 1:]) == mask, axis=1)
For example:
db = np.array([ # out for mask = [1, 0, 1]
# target, vector #
[1, 1, 0, 1], # 1
[0, 1, 1, 1], # 0 (fit to mask but target == 0)
[0, 0, 1, 0], # 0
[1, 1, 0, 1], # 1
[0, 1, 1, 0], # 0
[1, 0, 0, 0], # 0
])
mask = np.array([1, 0, 1])
print(repr(vline_vectorized(db, mask)))
# array([1, 0, 0, 1, 0, 0])
来源:https://stackoverflow.com/questions/34496409/use-numpy-frompyfunc-to-add-broadcasting-to-a-python-function-with-argument