可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm trying to run what I think is simple code to eliminate any columns with all NaNs, but can't get this to work (axis = 1
works just fine when eliminating rows):
import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]}) df = df[df.notnull().any(axis = 0)] print df
Full error:
raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Expected output:
a b c 0 1.0 4.0 NaN 1 2.0 NaN 8.0 2 NaN 6.0 9.0 3 NaN NaN NaN
回答1:
You need loc
, because filter by columns:
print (df.notnull().any(axis = 0)) a True b True c True d False dtype: bool df = df.loc[:, df.notnull().any(axis = 0)] print (df) a b c 0 1.0 4.0 NaN 1 2.0 NaN 8.0 2 NaN 6.0 9.0 3 NaN NaN NaN
Or filter columns and then select by []
:
print (df.columns[df.notnull().any(axis = 0)]) Index(['a', 'b', 'c'], dtype='object') df = df[df.columns[df.notnull().any(axis = 0)]] print (df) a b c 0 1.0 4.0 NaN 1 2.0 NaN 8.0 2 NaN 6.0 9.0 3 NaN NaN NaN
Or dropna
with parameter how='all'
for remove all columns filled by NaN
s only:
print (df.dropna(axis=1, how='all')) a b c 0 1.0 4.0 NaN 1 2.0 NaN 8.0 2 NaN 6.0 9.0 3 NaN NaN NaN
回答2:
You can use dropna
with axis=1
and thresh=1
:
In[19]: df.dropna(axis=1, thresh=1) Out[19]: a b c 0 1.0 4.0 NaN 1 2.0 NaN 8.0 2 NaN 6.0 9.0 3 NaN NaN NaN
This will drop any column which doesn't have at least 1 non-NaN value which will mean any column with all NaN
will get dropped
The reason what you tried failed is because the boolean mask:
In[20]: df.notnull().any(axis = 0) Out[20]: a True b True c True d False dtype: bool
cannot be aligned on the index which is what is used by default, as this produces a boolean mask on the columns