In my code, I have several variables which can either contain a pandas DataFrame or nothing at all. Let's say I want to test and see if a certain DataFrame has been created yet or not. My first thought would be to test for it like this:
if df1: # do something
However, that code fails in this way:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Fair enough. Ideally, I would like to have a presence test that works for either a DataFrame or Python None.
Here is one way this can work:
if not isinstance(df1, type(None)): # do something
However, testing for type is really slow.
t = timeit.Timer('if None: pass') t.timeit() # approximately 0.04 t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None') t.timeit() # approximately 0.4
Ouch. Along with being slow, testing for NoneType isn't very flexible, either.
A different solution would be to initialize df1
as an empty DataFrame, so that the type would be the same in both the null and non-null cases. I could then just test using len()
, or any()
, or something like that. Making an empty DataFrame seems kind of silly and wasteful, though.
Another solution would be to have an indicator variable: df1_exists
, which is set to False until df1
is created. Then, instead of testing df1
, I would be testing df1_exists
. But this doesn't seem all that elegant, either.
Is there a better, more Pythonic way of handling this issue? Am I missing something, or is this just an awkward side effect all the awesome things about pandas?