Evaluating pandas series values with logical expressions and if-statements

依然范特西╮ 提交于 2019-12-09 02:29:57

问题


I'm having trouble evaluating values from a dictionary using if statements.

Given the following dictionary, which I imported from a dataframe (in case it matters):

>>> pnl[company]
29:   Active Credit       Date   Debit Strike Type
0      1      0 2013-01-08  2.3265  21.15  Put
1      0      0 2012-11-26      40     80  Put
2      0      0 2012-11-26     400     80  Put

I tried to evaluate the following statment to establish the value of the last value of Active:

if pnl[company].tail(1)['Active']==1:
    print 'yay'

However,I was confronted by the following error message:

Traceback (most recent call last):
  File "<pyshell#69>", line 1, in <module>
    if pnl[company].tail(1)['Active']==1:
  File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 676, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This surprised me, given that I could display the value I wanted using the above command without the if statement:

>>> pnl[company].tail(1)['Active']
30: 2    0
Name: Active, dtype: object

Given that the value is clearly zero and the index is 2, I tried the following for a brief sanity check and found that things weren't happening as I might have expected:

>>> if pnl[company]['Active'][2]==0:
...     print 'woo-hoo'
... else:
...     print 'doh'


doh

My Question is:

1) What might be going on here? I suspect I'm misunderstanding dictionaries on some fundamental level.

2) I noticed that as I bring up any given value of this dictionary, the number on the left increases by 1. What does this represent? For example:

>>> pnl[company].tail(1)['Active']
31: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
32: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
33: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
34: 2    0
Name: Active, dtype: object

Thanks in advance for any help.


回答1:


What you yield is a Pandas Series object and this cannot be evaluated in the manner you are attempting even though it is just a single value you need to change your line to:

if pnl[company].tail(1)['Active'].any()==1:
  print 'yay'

With respect to your second question see my comment.

EDIT

From the comments and link to your output, calling any() fixed the error message but your data is actually strings so the comparison still failed, you could either do:

if pnl[company].tail(1)['Active'].any()=='1':
  print 'yay'

To do a string comparison, or fix the data however it was read or generated.

Or do:

pnl['Company']['Active'] = pnl['Company']['Active'].astype(int)

To convert the dtype of the column so that your comparison is more correct.




回答2:


A Series is a subclass of NDFrame. The NDFrame.__bool__ method always raises a ValueError. Thus, trying to evaluate a Series in a boolean context raises a ValueError -- even if the Series has but a single value.

The reason why NDFrames have no boolean value (err, that is, always raise a ValueError), is because there is more than one possible criterion that one might reasonably expect for an NDFrame to be True. It could mean

  1. every item in the NDFrame is True, or (if so, use .all())
  2. any item in the NDFrame is True, or (if so, use Series.any())
  3. the NDFrame is not empty (if so, use .empty())

Since either is possible, and since different users have different expectations, instead of just choosing one, the developers refuse to guess and instead require the user of the NDFrame to make explicit what criterion they wish to use.

The error message lists the most likely choices:

Use a.empty, a.bool(), a.item(), a.any() or a.all()

Since in your case you know the Series will contain just one value, you could use item:

if pnl[company].tail(1)['Active'].item() == 1:
    print 'yay'

Regarding your second question: The numbers on the left seem to be line numbering produced by your Python interpreter (PyShell?) -- but that's just my guess.


WARNING: Presumably,

if pnl[company].tail(1)['Active']==1:

means you would like the condition to be True when the single value in the Series equals 1. The code

if pnl[company].tail(1)['Active'].any()==1:
    print 'yay'

will be True if the dtype of the Series is numeric and the value in the Series is any number other than 0. For example, if we take pnl[company].tail(1)['Active'] to be equal to

In [128]: s = pd.Series([2], index=[2])

then

In [129]: s.any()
Out[129]: True

and therefore,

In [130]: s.any()==1
Out[130]: True

I think s.item() == 1 more faithfully preserves your intended meaning:

In [132]: s.item()==1
Out[132]: False

(s == 1).any() would also work, but using any does not express your intention very plainly, since you know the Series will contain only one value.




回答3:


Your question has nothing to do with Python dictionaries, or native Python at all. It's about pandas Series, and the other answers gave you the correct syntax:

Interpreting your questions in the wider sense, it's about how pandas Series was shoehorned onto NumPy, and NumPy historically until recently had notoriously poor support for logical values and operators. pandas does the best job it can with what NumPy provides. Having to sometimes manually invoke numpy logical functions instead of just writing code with arbitrary (Python) operators is annoying and clunky and sometimes bloats pandas code. Also, you often have to this for performance (numpy better than thunking to and from native Python). But that's the price we pay.

There are many limitations, quirks and gotchas (examples below) - the best advice is to be distrustful of boolean as a first-class-citizen in pandas due to numpy's limitations:

  • pandas Caveats and Gotchas - Using If/Truth Statements with Pandas

  • a performance example: Python ~ can be used instead of np.invert() - more legible but 3x slower or worse

  • some gotchas and limitations: in the code below, note that recent numpy now allows boolean values (internally represented as int) and allows NAs, but that e.g. value_counts() ignores NAs (compare to R's table, which has option 'useNA').

.

import numpy as np
import pandas as pd
s = pd.Series([True, True, False, True, np.NaN])
s2  = pd.Series([True, True, False, True, np.NaN])
dir(s) # look at .all, .any, .bool, .eq, .equals, .invert, .isnull, .value_counts() ...

s.astype(bool) # WRONG: should use the member s.bool ; no parentheses, it's a member, not a function
# 0     True
# 1     True
# 2    False
# 3     True
# 4     True  # <--- should be NA!!
#dtype: bool

s.bool
# <bound method Series.bool of
# 0     True
# 1     True
# 2    False
# 3     True
# 4      NaN
# dtype: object>

# Limitation: value_counts() currently excludes NAs
s.value_counts()
# True     3
# False    1
# dtype: int64
help(s.value_counts) # "... Excludes NA values(!)"

# Equality comparison - vector - fails on NAs, again there's no NA-handling option):
s == s2 # or equivalently, s.eq(s2)
# 0     True
# 1     True
# 2     True
# 3     True
# 4    False  # BUG/LIMITATION: we should be able to choose NA==NA
# dtype: bool

# ...but the scalar equality comparison says they are equal!!
s.equals(s2)
# True


来源:https://stackoverflow.com/questions/23461502/evaluating-pandas-series-values-with-logical-expressions-and-if-statements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!