Pandas: IndexingError: Unalignable boolean Series provided as indexer

匿名 (未验证) 提交于 2019-12-03 08:46:08

问题:

I'm trying to run what I think is simple code to eliminate any columns with all NaNs, but can't get this to work (axis = 1 works just fine when eliminating rows):

import pandas as pd import numpy as np  df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})  df = df[df.notnull().any(axis = 0)]  print df 

Full error:

raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Expected output:

     a    b    c 0  1.0  4.0  NaN 1  2.0  NaN  8.0 2  NaN  6.0  9.0 3  NaN  NaN  NaN 

回答1:

You need loc, because filter by columns:

print (df.notnull().any(axis = 0)) a     True b     True c     True d    False dtype: bool  df = df.loc[:, df.notnull().any(axis = 0)] print (df)       a    b    c 0  1.0  4.0  NaN 1  2.0  NaN  8.0 2  NaN  6.0  9.0 3  NaN  NaN  NaN 

Or filter columns and then select by []:

print (df.columns[df.notnull().any(axis = 0)]) Index(['a', 'b', 'c'], dtype='object')  df = df[df.columns[df.notnull().any(axis = 0)]] print (df)       a    b    c 0  1.0  4.0  NaN 1  2.0  NaN  8.0 2  NaN  6.0  9.0 3  NaN  NaN  NaN 

Or dropna with parameter how='all' for remove all columns filled by NaNs only:

print (df.dropna(axis=1, how='all'))      a    b    c 0  1.0  4.0  NaN 1  2.0  NaN  8.0 2  NaN  6.0  9.0 3  NaN  NaN  NaN 


回答2:

You can use dropna with axis=1 and thresh=1:

In[19]: df.dropna(axis=1, thresh=1)  Out[19]:       a    b    c 0  1.0  4.0  NaN 1  2.0  NaN  8.0 2  NaN  6.0  9.0 3  NaN  NaN  NaN 

This will drop any column which doesn't have at least 1 non-NaN value which will mean any column with all NaN will get dropped

The reason what you tried failed is because the boolean mask:

In[20]: df.notnull().any(axis = 0)  Out[20]:  a     True b     True c     True d    False dtype: bool 

cannot be aligned on the index which is what is used by default, as this produces a boolean mask on the columns



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!