How to delete a column in pandas dataframe based on a condition?

前端 未结 2 1063
猫巷女王i
猫巷女王i 2020-12-09 19:39

I have a pandas DataFrame, with many NAN values in it.

How can I drop columns such that number_of_na_values > 2000?

I tried to

2条回答
  •  既然无缘
    2020-12-09 20:14

    Same logic, but just put all things in one line.

    import pandas as pd
    import numpy as np
    
    # artificial data
    # ====================================
    np.random.seed(0)
    df = pd.DataFrame(np.random.randn(10,5), columns=list('ABCDE'))
    df[df < 0] = np.nan
    
            A       B       C       D       E
    0  1.7641  0.4002  0.9787  2.2409  1.8676
    1     NaN  0.9501     NaN     NaN  0.4106
    2  0.1440  1.4543  0.7610  0.1217  0.4439
    3  0.3337  1.4941     NaN  0.3131     NaN
    4     NaN  0.6536  0.8644     NaN  2.2698
    5     NaN  0.0458     NaN  1.5328  1.4694
    6  0.1549  0.3782     NaN     NaN     NaN
    7  0.1563  1.2303  1.2024     NaN     NaN
    8     NaN     NaN     NaN  1.9508     NaN
    9     NaN     NaN  0.7775     NaN     NaN
    
    # processing: drop columns with no. of NaN > 3
    # ====================================
    df.drop(df.columns[df.apply(lambda col: col.isnull().sum() > 3)], axis=1)
    
    
    Out[183]:
            B
    0  0.4002
    1  0.9501
    2  1.4543
    3  1.4941
    4  0.6536
    5  0.0458
    6  0.3782
    7  1.2303
    8     NaN
    9     NaN
    

提交回复
热议问题