filter dataframe rows based on length of column values

后端 未结 4 1311
误落风尘
误落风尘 2021-01-17 18:28

I have a pandas dataframe as follows:

df = pd.DataFrame([ [1,2], [np.NaN,1], [\'test string1\', 5]], columns=[\'A\',\'B\'] )

df
              A  B
0                 


        
4条回答
  •  佛祖请我去吃肉
    2021-01-17 19:27

    In [42]: df
    Out[42]:
                  A  B                         C          D
    0             1  2                         2 2017-01-01
    1           NaN  1                       NaN 2017-01-02
    2  test string1  5  test string1test string1 2017-01-03
    
    In [43]: df.dtypes
    Out[43]:
    A            object
    B             int64
    C            object
    D    datetime64[ns]
    dtype: object
    
    In [44]: df.loc[~df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(1)]
    Out[44]:
         A  B    C          D
    0    1  2    2 2017-01-01
    1  NaN  1  NaN 2017-01-02
    

    Explanation:

    df.select_dtypes(['object']) selects only columns of object (str) dtype:

    In [45]: df.select_dtypes(['object'])
    Out[45]:
                  A                         C
    0             1                         2
    1           NaN                       NaN
    2  test string1  test string1test string1
    
    In [46]: df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10))
    Out[46]:
           A      C
    0  False  False
    1  False  False
    2   True   True
    

    now we can "aggregate" it as follows:

    In [47]: df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(axis=1)
    Out[47]:
    0    False
    1    False
    2     True
    dtype: bool
    

    finally we can select only those rows where value is False:

    In [48]: df.loc[~df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(axis=1)]
    Out[48]:
         A  B    C          D
    0    1  2    2 2017-01-01
    1  NaN  1  NaN 2017-01-02
    

提交回复
热议问题