In a Pandas dataframe, I would like to filter out all the rows that have more than 2 NaNs.
Essentially, I have 4 columns and I would like to keep only t
I had a slightly different problem i.e. to filter out columns with more than certain number of NaN:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,np.nan,4,5], 'b':[np.nan,2,np.nan,4,5], 'c':[1,2,np.nan,np.nan,np.nan], 'd':[1,2,3,np.nan,5]})
df
a b c d
0 1.0 NaN 1.0 1.0
1 2.0 2.0 2.0 2.0
2 NaN NaN NaN 3.0
3 4.0 4.0 NaN NaN
4 5.0 5.0 NaN 5.0
Assume you want to filter out columns with 3 or more Nan's:
num_rows = df.shape[0]
drop_cols_with_this_amount_of_nans_or_more = 3
keep_cols_with_at_least_this_number_of_non_nans = num_rows - drop_cols_with_this_amount_of_nans_or_more + 1
df.dropna(axis=1,thresh=keep_cols_with_at_least_this_number_of_non_nans)
output: (column c has been dropped as expected):
a b d
0 1.0 NaN 1.0
1 2.0 2.0 2.0
2 NaN NaN 3.0
3 4.0 4.0 NaN
4 5.0 5.0 5.0