Drop rows with all zeros in pandas data frame

前端 未结 13 1207
礼貌的吻别
礼貌的吻别 2020-11-27 11:57

I can use pandas dropna() functionality to remove rows with some or all columns set as NA\'s. Is there an equivalent function for drop

13条回答
  •  孤独总比滥情好
    2020-11-27 12:39

    Couple of solutions I found to be helpful while looking this up, especially for larger data sets:

    df[(df.sum(axis=1) != 0)]       # 30% faster 
    df[df.values.sum(axis=1) != 0]  # 3X faster 
    

    Continuing with the example from @U2EF1:

    In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
    
    In [91]: %timeit df[(df.T != 0).any()]
    1000 loops, best of 3: 686 µs per loop
    
    In [92]: df[(df.sum(axis=1) != 0)]
    Out[92]: 
       a  b
    1  0  1
    2  1  0
    3  1  1
    
    In [95]: %timeit df[(df.sum(axis=1) != 0)]
    1000 loops, best of 3: 495 µs per loop
    
    In [96]: %timeit df[df.values.sum(axis=1) != 0]
    1000 loops, best of 3: 217 µs per loop
    

    On a larger dataset:

    In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))
    
    In [120]: %timeit bdf[(bdf.T != 0).any()]
    1000 loops, best of 3: 1.63 ms per loop
    
    In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
    1000 loops, best of 3: 1.09 ms per loop
    
    In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
    1000 loops, best of 3: 517 µs per loop
    

提交回复
热议问题