I am getting the error when I make a comparison on a single element in a dataframe, but I don\'t understand why.
I have a dataframe df with timeseries data for a nu
The problem lies in the if statement.
When you code
if this:
print(that)
this will be evaluated as bool(this). And that better come back as True or False.
However, you did:
if pd.isnull(df[[customer_ID]].loc[ts]):
pass # idk what you did here because you didn't say... but doesn't matter
Also, you stated that pd.isnull(df[[customer_ID]].loc[ts]) evaluated to:
8143511 True
Name: 2012-07-01 00:00:00, dtype: bool
Does that look like a True or False?
What about bool(pd.isnull(df[[customer_ID]].loc[ts]))?
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So the lesson is: A pd.Series cannot be evaluated as True or False
It is, however, a pd.Series of Trues and Falses.
And that is why it doesn't work.
The second set of [] was returning a series which I mistook for a single value. The simplest solution is to remove []:
if pd.isnull(df[customer_ID].loc[ts]):
pass
Problem is you need compare scalar for return scalar (True, False), but there is one item Series, which is converted to one item boolean Series.
Solutions is converting to scalar using Series.item or values with selecting first value by [0]:
customer_ID = '8143511'
ts = '2012-07-01 00:00:00'
print (df[[customer_ID]].loc[ts].item())
nan
if pd.isnull(df[[customer_ID]].loc[ts]).item():
print ('super')
print (df[[customer_ID]].loc[ts].values[0])
nan
if pd.isnull(df[[customer_ID]].loc[ts]).values[0]:
print ('super')
But if use DataFrame.loc, get scalar (if not duplicated index or columns names):
print (df.loc[ts, customer_ID])
nan
customer_ID = '8143511'
ts = '2012-07-01 00:00:00'
if pd.isnull(df.loc[ts, customer_ID]):
print ('super')