FutureWarning: elementwise comparison failed; returning scalar, but in the future will perform elementwise comparison

匿名 (未验证) 提交于 2019-12-03 02:16:02

问题:

I am using Pandas 0.19.1 on Python 3. I am getting a warning on these lines of code. I'm trying to get a list that contains all the row numbers where string Peter is present at column Unnamed: 5.

df = pd.read_excel(xls_path) myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist() 

Warning:

"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise  comparison failed; returning scalar, but in the future will perform  elementwise comparison  result = getattr(x, name)(y)" 

What is this FutureWarning and should I ignore it since it seems to work.

回答1:

This FutureWarning isn't from Pandas, it is from numpy and the bug also affects matplotlib and others, here's how to reproduce the warning nearer to the source of the trouble:

import numpy as np print(np.__version__)   # Numpy version '1.12.0' 'x' in np.arange(5)       #Future warning thrown here  FutureWarning: elementwise comparison failed; returning scalar instead, but in the  future will perform elementwise comparison False 

Another way to reproduce this bug using the double equals operator:

import numpy as np np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here 

An example of Matplotlib affected by this FutureWarning under their quiver plot implementation: https://matplotlib.org/examples/pylab_examples/quiver_demo.html

What's going on here?

There is a disagreement between Numpy and native python on what should happen when you compare a strings to numpy's numeric types. Notice the left operand is python's turf, a primitive string, and the middle operation is python's turf, but the right operand is numpy's turf. Should you return a Python style Scalar or a Numpy style ndarray of boolean? Numpy says ndarray of bool, Pythonic developers disagree. Classic standoff.

Should it be elementwise comparison or Scalar if item exists in the array?

If your code or library is using the in or == operators to compare python string to numpy ndarrays, they aren't compatible, so when if you try it, it returns a scalar, but only for now. The Warning indicates that in the future this behavior might change so your code pukes all over the carpet if python/numpy decide to do adopt Numpy style.

Submitted Bug reports:

Numpy and Python are in a standoff, for now the operation returns a scalar, but in the future it may change.

https://github.com/numpy/numpy/issues/6784

https://github.com/pandas-dev/pandas/issues/7830

Two workaround solutions:

Either lockdown your version of python and numpy and ignore the warnings, or babysit your left and right operands to be from a common turf.

Suppress the warning globally:

import warnings import numpy as np warnings.simplefilter(action='ignore', category=FutureWarning) print('x' in np.arange(5))   #returns False, without Warning 

Suppress the warning on a line by line basis.

import warnings import numpy as np  with warnings.catch_warnings():     warnings.simplefilter(action='ignore', category=FutureWarning)     print('x' in np.arange(2))   #returns False, warning is suppressed  print('x' in np.arange(10))   #returns False, Throws FutureWarning 

Just suppress the warning by name, then put a loud comment next to it mentioning the current version of python and numpy, saying this code is brittle and requires these versions and put a link to here. Kick the can down the road.



回答2:

My experience to the same warning message was caused by TypeError.

TypeError: invalid type comparison

So, you may want to check the data type of the Unnamed: 5

for x in df['Unnamed: 5']:   print(type(x))  # are they 'str' ? 

Here is how I can replicate the warning message:

import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2']) df['num3'] = 3 df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning df.loc[df['num3'] == 3, 'num3'] = 4  # No Error 

Hope it helps.



回答3:

If your arrays aren't too big or you don't have too many of them, you might be able to get away with forcing the left hand side of == to be a string:

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist() 

But this is ~1.5 times slower if df['Unnamed: 5'] is a string, 25-30 times slower if df['Unnamed: 5'] is a small numpy array (length = 10), and 150-160 times slower if it's a numpy array with length 100 (times averaged over 500 trials).

a = linspace(0, 5, 10) b = linspace(0, 50, 100) n = 500 string1 = 'Peter' string2 = 'blargh' times_a = zeros(n) times_str_a = zeros(n) times_s = zeros(n) times_str_s = zeros(n) times_b = zeros(n) times_str_b = zeros(n) for i in range(n):     t0 = time.time()     tmp1 = a == string1     t1 = time.time()     tmp2 = str(a) == string1     t2 = time.time()     tmp3 = string2 == string1     t3 = time.time()     tmp4 = str(string2) == string1     t4 = time.time()     tmp5 = b == string1     t5 = time.time()     tmp6 = str(b) == string1     t6 = time.time()     times_a[i] = t1 - t0     times_str_a[i] = t2 - t1     times_s[i] = t3 - t2     times_str_s[i] = t4 - t3     times_b[i] = t5 - t4     times_str_b[i] = t6 - t5 print('Small array:') print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a))) print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))  print('\nBig array') print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b))) print(mean(times_str_b)/mean(times_b))  print('\nString') print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s))) print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s))) 

Result:

Small array: Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s Ratio of time with/without string conversion: 26.3881526541  Big array Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s 159.99474375821288  String Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s Ratio of time with/without string conversion: 1.40857605178 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!