Logical OR on a subset of columns in a DataFrame

前端 未结 2 1520
旧时难觅i
旧时难觅i 2021-01-03 01:45

I want to get all rows where (at least) one of the columns in df[mylist] contains True.

I\'m currently doing:

df = df[ df[mylist[0]] | df[mylist[1]]          


        
相关标签:
2条回答
  • 2021-01-03 02:41

    Building on LondonRob's answer, you could use

    df.loc[df[mylist].any(axis=1)]
    

    Calling the DataFrame's any method will perform better than using apply to call Python's builtin any function once per row.

    Or you could use np.logical_or.reduce:

    df.loc[np.logical_or.reduce(df[mylist], axis=1)]
    

    For large DataFrames, using np.logical_or may be quicker:

    In [30]: df = pd.DataFrame(np.random.binomial(1, 0.1, size=(100,300)).astype(bool))
    
    In [31]: %timeit df.loc[np.logical_or.reduce(df, axis=1)]
    1000 loops, best of 3: 261 µs per loop
    
    In [32]: %timeit df.loc[df.any(axis=1)]
    1000 loops, best of 3: 636 µs per loop
    
    In [33]: %timeit df[df.apply(any, axis=1)]
    100 loops, best of 3: 2.13 ms per loop
    

    Note that df.any has extra features, such as the ability to skip NaNs. In this case, if the columns are boolean-valued, then there can not be any NaNs (since NaNs are float values). So np.logical_or.reduce is quicker.


    import numpy as np
    import pandas as pd
    np.random.seed(2014)
    df = pd.DataFrame(np.random.binomial(1, 0.1, size=(10,3)).astype(bool), 
                      columns=list('ABC'))
    print(df)
    #        A      B      C
    # 0  False  False  False
    # 1   True  False  False
    # 2  False  False  False
    # 3   True  False  False
    # 4  False  False  False
    # 5  False  False  False
    # 6  False   True  False
    # 7  False  False  False
    # 8  False  False  False
    # 9  False  False  False
    
    mylist = list('ABC')
    print(df[ df[mylist[0]] | df[mylist[1]] | df[mylist[2]] ])
    print(df.loc[df[mylist].any(axis=1)])
    print(df.loc[np.logical_or.reduce(df[mylist], axis=1)])
    

    yields the rows where at least one of the columns is True:

           A      B      C
    1   True  False  False
    3   True  False  False
    6  False   True  False
    
    0 讨论(0)
  • 2021-01-03 02:46

    There's a much simpler way to do this using python's built in any function:

    In []: mylist
    Out[]: ['A', 'B']
    
    In []: df
    Out[]: 
           A      B      C
    0  False  False  False
    1   True  False  False
    2  False  False  False
    3   True  False  False
    4  False  False  False
    5  False  False  False
    6  False   True  False
    7  False  False  False
    8  False  False  False
    9  False  False  False
    

    You can apply the function any along the rows of df by using axis=1. In this case I'll only apply any to a subset of the columns:

    In []: df[mylist].apply(any, axis=1)
    Out[]: 
    0    False
    1     True
    2    False
    3     True
    4    False
    5    False
    6     True
    7    False
    8    False
    9    False
    dtype: bool
    

    This gives us the perfect way to select our rows:

    In []: df[df[mylist].apply(any, axis=1)]
    Out[]: 
           A      B      C
    1   True  False  False
    3   True  False  False
    6  False   True  False
    
    0 讨论(0)
提交回复
热议问题