dropping infinite values from dataframes in pandas?

匿名 (未验证) 提交于 2019-12-03 02:03:01

问题:

what is the quickest/simplest way to drop nan and inf/-inf values from a pandas DataFrame without resetting mode.use_inf_as_null? I'd like to be able to use the subset and how arguments of dropna, except with inf values considered missing, like:

df.dropna(subset=["col1", "col2"], how="all", with_inf=True) 

is this possible? Is there a way to tell dropna to include inf in its definition of missing values?

回答1:

The simplest way would be to first replace infs to NaN:

df.replace([np.inf, -np.inf], np.nan) 

and then use the dropna:

df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all") 

For example:

In [11]: df = pd.DataFrame([1, 2, np.inf, -np.inf])  In [12]: df.replace([np.inf, -np.inf], np.nan) Out[12]:     0 0   1 1   2 2 NaN 3 NaN 

The same method would work for a Series.



回答2:

Here is another method using .loc to replace inf with nan on a Series:

s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan 

So, in response to the original question:

df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))  for i in range(3):      df.iat[i, i] = np.inf  df           A         B         C 0       inf  1.000000  1.000000 1  1.000000       inf  1.000000 2  1.000000  1.000000       inf  df.sum() A    inf B    inf C    inf dtype: float64  df.apply(lambda s: s[np.isfinite(s)].dropna()).sum() A    2 B    2 C    2 dtype: float64 


回答3:

With option context, this is possible without permanently setting use_inf_as_null. For example:

with pd.option_context('mode.use_inf_as_null', True):     df = df.dropna(subset=['col1', 'col2'], how='all') 

Of course it can be set to treat inf as NaN permanently with pd.set_option('use_inf_as_null', True) too.



回答4:

The above solution will modify the infs that are not in the target columns. To remedy that,

lst = [np.inf, -np.inf] to_replace = dict((v, lst) for v in ['col1', 'col2']) df.replace(to_replace, np.nan) 


回答5:

Yet another solution would be to use the isin method. Use it to determine whether each value is infinite or missing and then chain the all method to determine if all the values in the rows are infinite or missing.

Finally, use the negation of that result to select the rows that don't have all infinite or missing values via boolean indexing.

all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns') df[~all_inf_or_nan] 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!