Filter out rows with more than certain number of NaN

前端 未结 3 735
谎友^
谎友^ 2020-12-09 06:09

In a Pandas dataframe, I would like to filter out all the rows that have more than 2 NaNs.

Essentially, I have 4 columns and I would like to keep only t

3条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-09 06:49

    I had a slightly different problem i.e. to filter out columns with more than certain number of NaN:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'a':[1,2,np.nan,4,5], 'b':[np.nan,2,np.nan,4,5], 'c':[1,2,np.nan,np.nan,np.nan], 'd':[1,2,3,np.nan,5]})
    df
    
        a   b   c   d
    0   1.0 NaN 1.0 1.0
    1   2.0 2.0 2.0 2.0
    2   NaN NaN NaN 3.0
    3   4.0 4.0 NaN NaN
    4   5.0 5.0 NaN 5.0
    

    Assume you want to filter out columns with 3 or more Nan's:

    num_rows = df.shape[0]
    drop_cols_with_this_amount_of_nans_or_more = 3
    keep_cols_with_at_least_this_number_of_non_nans = num_rows - drop_cols_with_this_amount_of_nans_or_more + 1
    
    df.dropna(axis=1,thresh=keep_cols_with_at_least_this_number_of_non_nans)
    

    output: (column c has been dropped as expected):

        a   b   d
    0   1.0 NaN 1.0
    1   2.0 2.0 2.0
    2   NaN NaN 3.0
    3   4.0 4.0 NaN
    4   5.0 5.0 5.0
    

提交回复
热议问题