boolean and type checking in python vs numpy

I ran into unexpected results in a python if clause today:

import numpy
if numpy.allclose(6.0, 6.1, rtol=0, atol=0.5):
    print 'close enough'  # works as expected (prints message)

if numpy.allclose(6.0, 6.1, rtol=0, atol=0.5) is True:
    print 'close enough'  # does NOT work as expected (prints nothing)

After some poking around (i.e., this question, and in particular this answer), I understand the cause: the type returned by numpy.allclose() is numpy.bool_ rather than plain old bool, and apparently if foo = numpy.bool_(1), then if foo will evaluate to True while if foo is True will evaluate to False. This appears to be the work of the is operator.

My questions are: why does numpy have its own boolean type, and what is best practice in light of this situation? I can get away with writing if foo: to get expected behavior in the example above, but I like the more stringent if foo is True: because it excludes things like 2 and [2] from returning True, and sometimes the explicit type check is desirable.

why does numpy have its own boolean type

Space and speed. Numpy stores things in compact arrays; if it can fit a boolean into a single byte it'll try. You can't easily do this with Python objects, as you have to store references which slows calculations down significantly.

I can get away with writing if foo: to get expected behavior in the example above, but I like the more stringent if foo is True: because it excludes things like 2 and [2] from returning True, and sometimes the explicit type check is desirable.

Well, don't do that.

You're doing something which is considered an anti-pattern. Quoting PEP 8:

Don't compare boolean values to True or False using ==.

Yes:   if greeting:
No:    if greeting == True:
Worse: if greeting is True:

The fact that numpy wasn't designed to facilitate your non-pythonic code isn't a bug in numpy. In fact, it's a perfect example of why your personal idiom is an anti-pattern.

As PEP 8 says, using is True is even worse than == True. Why? Because you're checking object identity: not only must the result be truthy in a boolean context (which is usually all you need), and equal to the boolean True value, it has to actually be the constant True. It's hard to imagine any situation in which this is what you want.

And you specifically don't want it here:

>>> np.True_ == True
True
>>> np.True_ is True
False

So, all you're doing is explicitly making your code incompatible with numpy, and various other C extension libraries (conceivably a pure-Python library could return a custom value that's equal to True, but I don't know of any that do so).

In your particular case, there is no reason to exclude 2 and [2]. If you read the docs for numpy.allclose, it clearly isn't going to return them. But consider some other function, like many of those in the standard library that just say they evaluate to true or to false. That means they're explicitly allowed to return one of their truthy arguments, and often will do so. Why would you want to consider that false?

Finally, why would numpy, or any other C extension library, define such bool-compatible-but-not-bool types?

In general, it's because they're wrapping a C int or a C++ bool or some other such type. In numpy's case, it's wrapping a value that may be stored in a fastest-machine-word type or a single byte (maybe even a single bit in some cases) as appropriate for performance, and your code doesn't have to care which, because all representations look the same, including being truthy and equal to the True constant.

来源：https://stackoverflow.com/questions/18922407/boolean-and-type-checking-in-python-vs-numpy

标签

python

numpy

boolean

pep8