How do I delete a column that contains only zeros in Pandas?

后端 未结 3 1867
误落风尘
误落风尘 2020-12-07 09:06

I currently have a dataframe consisting of columns with 1\'s and 0\'s as values, I would like to iterate through the columns and delete the ones that are made up of only 0\'

相关标签:
3条回答
  • 2020-12-07 09:08

    In case you'd like a more expressive way of getting the zero-column names so you can print / log them, and drop them, in-place, by their names:

    zero_cols = [ col for col, is_zero in ((df == 0).sum() == df.shape[0]).items() if is_zero ]
    df.drop(zero_cols, axis=1, inplace=True)
    

    Some break down:

    # a pandas Series with {col: is_zero} items
    # is_zero is True when the number of zero items in that column == num_all_rows
    (df == 0).sum() == df.shape[0])
    
    # a list comprehension of zero_col_names is built from the_series
    [ col for col, is_zero in the_series.items() if is_zero ]
    
    0 讨论(0)
  • 2020-12-07 09:09
    df.loc[:, (df != 0).any(axis=0)]
    

    Here is a break-down of how it works:

    In [74]: import pandas as pd
    
    In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])
    
    In [76]: df
    Out[76]: 
       0  1  2  3
    0  1  0  0  0
    1  0  0  1  0
    
    [2 rows x 4 columns]
    

    df != 0 creates a boolean DataFrame which is True where df is nonzero:

    In [77]: df != 0
    Out[77]: 
           0      1      2      3
    0   True  False  False  False
    1  False  False   True  False
    
    [2 rows x 4 columns]
    

    (df != 0).any(axis=0) returns a boolean Series indicating which columns have nonzero entries. (The any operation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.)

    In [78]: (df != 0).any(axis=0)
    Out[78]: 
    0     True
    1    False
    2     True
    3    False
    dtype: bool
    

    And df.loc can be used to select those columns:

    In [79]: df.loc[:, (df != 0).any(axis=0)]
    Out[79]: 
       0  2
    0  1  0
    1  0  1
    
    [2 rows x 2 columns]
    

    To "delete" the zero-columns, reassign df:

    df = df.loc[:, (df != 0).any(axis=0)]
    
    0 讨论(0)
  • 2020-12-07 09:16

    Here is an alternative way to use is

    df.replace(0,np.nan).dropna(axis=1,how="all")

    Compared with the solution of unutbu, this way is obviously slower:

    %timeit df.loc[:, (df != 0).any(axis=0)]
    652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %timeit df.replace(0,np.nan).dropna(axis=1,how="all")
    1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    0 讨论(0)
提交回复
热议问题