Is the mask of a structured array supposed to be structured itself?

老子叫甜甜 提交于 2019-12-24 18:07:50

问题


I was looking into numpy issue 2972 and several related problems. It turns out that all those problems are related to the situation where the array itself is structured, but its mask is not:

In [38]: R = numpy.zeros(10, dtype=[("A", "<f2"), ("B", "<f4")])

In [39]: Rm = numpy.ma.masked_where(R["A"]<5, R)

In [41]: Rm.dtype
Out[41]: dtype([('A', '<f2'), ('B', '<f4')])

In [42]: Rm.mask.dtype
Out[42]: dtype('bool')

# Now, both `__getitem__` and `__repr__` will result in errors — see issue #2972

If I create a masked array differently, the mask dtype is structured like the dtype of the array itself:

In [44]: Q.dtype
Out[44]: dtype([('A', '<f4'), ('B', '<f4')])

In [45]: Q.mask.dtype
Out[45]: dtype([('A', '?'), ('B', '?')])

The former situation exposes several problems. For example, Rm.__repr__() and Rm["A"] both result in IndexError, although it was a ValueError in the past.

By design, is the pattern supposed to be possible, where A.dtype is structured, but A.mask.dtype is not structured?

In other words: is the bug in the __repr__ and __getitem__ methods in numpy.ma.core.MaskedArray, or is the real bug occurring before — by permitting such a masked structured array to exist in the first place?


回答1:


The errors in your 1st case indicate that the methods expect the mask to have the same number (and names) of fields as the base array

__getitem__:  dout._mask = _mask[indx]
_recursive_printoption: (curdata, curmask) = (result[name], mask[name])

If the masked array is make with the 'main' constructor, the mask has the same structure

Rn = np.ma.masked_array(R, mask=R['A']>5)
Rn.mask.dtype: dtype([('A', '?'), ('B', '?')])

In other words, there is a mask value for each field of each element.

The masked_array doc evidently intends for 'same shape' to include dtype structure. Mask: Must be convertible to an array of booleans with the same shape as 'data'.

If I try to set the mask in the same way that masked_where does

Rn._mask=R['A']>5

I get the same print error. The structured mask gets overwritten with the new boolean, changing its dtype. In contrast if I use

Rn.mask=R['A']<5

Rn prints fine. .mask is a property, whose set method evidently handles the structured mask correctly.

Without digging into the code history (on github) my guess is that masked_where is a convenience function that wasn't updated when structure dtypes were added to other parts of the ma code. Compared to ma.masked_array it's a simple function that does not look at the dtype at all. Other convenience functions like ma.masked_greater use masked_where. Changing result._mask = cond to result.mask = cond might be all that is need to correct this issue.


How thoroughly have you tested the consequences of an unstructured mask?

Rm.flatten()

returns an array with a structured mask, even when it started with an unstructured one. That's because it uses Rm.__setmask__, which is sensitive to fields. And that's the set function for the mask property.

Rm.tolist()  # same error as str()

masked_where starts with:

cond = make_mask(condition)

make_mask returns the simple 'bool' dtype. It can also be called with a dtype, producing a structured mask: np.ma.make_mask(R['A']<5,dtype=R.dtype). But such a structured mask gets flattened when used in masked_where. masked_where not only allows a unstructured mask, it forces it to be unstructured.

Your unstructured mask is already partly implemented, the recordmask property:

recordmask = property(fget=_get_recordmask)

I say partly because it has a get method, but the set method is not yet implemented. See def _set_recordmask(self):

The more I look at this the more I'm convinced that masked_where is wrong. It could be changed to set a structured mask, but then it's not much different from masked_array. It might better if it raises an error when the array is structured (has dtype.names). That way masked_where will remain useful for unstructured numeric arrays, while preventing misapplication to structured ones.

I should also look at the test code.



来源:https://stackoverflow.com/questions/28182408/is-the-mask-of-a-structured-array-supposed-to-be-structured-itself

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!