How to check whether a pandas DataFrame is empty?

前端 未结 5 823
别跟我提以往
别跟我提以往 2020-12-02 04:01

How to check whether a pandas DataFrame is empty? In my case I want to print some message in terminal if the DataFrame is empty.

5条回答
  •  星月不相逢
    2020-12-02 04:30

    To see if a dataframe is empty, I argue that one should test for the length of a dataframe's columns index:

    if len(df.columns) == 0: 1
    

    Reason:

    According to the Pandas Reference API, there is a distinction between:

    • an empty dataframe with 0 rows and 0 columns
    • an empty dataframe with rows containing NaN hence at least 1 column

    Arguably, they are not the same. The other answers are imprecise in that df.empty, len(df), or len(df.index) make no distinction and return index is 0 and empty is True in both cases.

    Examples

    Example 1: An empty dataframe with 0 rows and 0 columns

    In [1]: import pandas as pd
            df1 = pd.DataFrame()
            df1
    Out[1]: Empty DataFrame
            Columns: []
            Index: []
    
    In [2]: len(df1.index)  # or len(df1)
    Out[2]: 0
    
    In [3]: df1.empty
    Out[3]: True
    

    Example 2: A dataframe which is emptied to 0 rows but still retains n columns

    In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
            df2
    Out[4]:    AA  BB
            0   1  11
            1   2  22
            2   3  33
    
    In [5]: df2 = df2[df2['AA'] == 5]
            df2
    Out[5]: Empty DataFrame
            Columns: [AA, BB]
            Index: []
    
    In [6]: len(df2.index)  # or len(df2)
    Out[6]: 0
    
    In [7]: df2.empty
    Out[7]: True
    

    Now, building on the previous examples, in which the index is 0 and empty is True. When reading the length of the columns index for the first loaded dataframe df1, it returns 0 columns to prove that it is indeed empty.

    In [8]: len(df1.columns)
    Out[8]: 0
    
    In [9]: len(df2.columns)
    Out[9]: 2
    

    Critically, while the second dataframe df2 contains no data, it is not completely empty because it returns the amount of empty columns that persist.

    Why it matters

    Let's add a new column to these dataframes to understand the implications:

    # As expected, the empty column displays 1 series
    In [10]: df1['CC'] = [111, 222, 333]
             df1
    Out[10]:    CC
             0 111
             1 222
             2 333
    In [11]: len(df1.columns)
    Out[11]: 1
    
    # Note the persisting series with rows containing `NaN` values in df2
    In [12]: df2['CC'] = [111, 222, 333]
             df2
    Out[12]:    AA  BB   CC
             0 NaN NaN  111
             1 NaN NaN  222
             2 NaN NaN  333
    In [13]: len(df2.columns)
    Out[13]: 3
    

    It is evident that the original columns in df2 have re-surfaced. Therefore, it is prudent to instead read the length of the columns index with len(pandas.core.frame.DataFrame.columns) to see if a dataframe is empty.

    Practical solution

    # New dataframe df
    In [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
            df
    Out[1]:    AA  BB
            0   1  11
            1   2  22
            2   3  33
    
    # This data manipulation approach results in an empty df
    # because of a subset of values that are not available (`NaN`)
    In [2]: df = df[df['AA'] == 5]
            df
    Out[2]: Empty DataFrame
            Columns: [AA, BB]
            Index: []
    
    # NOTE: the df is empty, BUT the columns are persistent
    In [3]: len(df.columns)
    Out[3]: 2
    
    # And accordingly, the other answers on this page
    In [4]: len(df.index)  # or len(df)
    Out[4]: 0
    
    In [5]: df.empty
    Out[5]: True
    
    # SOLUTION: conditionally check for empty columns
    In [6]: if len(df.columns) != 0:  # <--- here
                # Do something, e.g. 
                # drop any columns containing rows with `NaN`
                # to make the df really empty
                df = df.dropna(how='all', axis=1)
            df
    Out[6]: Empty DataFrame
            Columns: []
            Index: []
    
    # Testing shows it is indeed empty now
    In [7]: len(df.columns)
    Out[7]: 0
    

    Adding a new data series works as expected without the re-surfacing of empty columns (factually, without any series that were containing rows with only NaN):

    In [8]: df['CC'] = [111, 222, 333]
             df
    Out[8]:    CC
             0 111
             1 222
             2 333
    In [9]: len(df.columns)
    Out[9]: 1
    

提交回复
热议问题