How to check whether a pandas DataFrame is empty?

前端未结

关注

 5  823

别跟我提以往 2020-12-02 04:01

How to check whether a pandas DataFrame is empty? In my case I want to print some message in terminal if the DataFrame is empty.

5条回答

星月不相逢 (楼主)

2020-12-02 04:30

To see if a dataframe is empty, I argue that one should test for the length of a dataframe's columns index:

if len(df.columns) == 0: 1

Reason:

According to the Pandas Reference API, there is a distinction between:

an empty dataframe with 0 rows and 0 columns
an empty dataframe with rows containing NaN hence at least 1 column

Arguably, they are not the same. The other answers are imprecise in that df.empty, len(df), or len(df.index) make no distinction and return index is 0 and empty is True in both cases.

Examples

Example 1: An empty dataframe with 0 rows and 0 columns

In [1]: import pandas as pd
        df1 = pd.DataFrame()
        df1
Out[1]: Empty DataFrame
        Columns: []
        Index: []

In [2]: len(df1.index)  # or len(df1)
Out[2]: 0

In [3]: df1.empty
Out[3]: True

Example 2: A dataframe which is emptied to 0 rows but still retains n columns

In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
        df2
Out[4]:    AA  BB
        0   1  11
        1   2  22
        2   3  33

In [5]: df2 = df2[df2['AA'] == 5]
        df2
Out[5]: Empty DataFrame
        Columns: [AA, BB]
        Index: []

In [6]: len(df2.index)  # or len(df2)
Out[6]: 0

In [7]: df2.empty
Out[7]: True

Now, building on the previous examples, in which the index is 0 and empty is True. When reading the length of the columns index for the first loaded dataframe df1, it returns 0 columns to prove that it is indeed empty.

In [8]: len(df1.columns)
Out[8]: 0

In [9]: len(df2.columns)
Out[9]: 2

Critically, while the second dataframe df2 contains no data, it is not completely empty because it returns the amount of empty columns that persist.

Why it matters

Let's add a new column to these dataframes to understand the implications:

# As expected, the empty column displays 1 series
In [10]: df1['CC'] = [111, 222, 333]
         df1
Out[10]:    CC
         0 111
         1 222
         2 333
In [11]: len(df1.columns)
Out[11]: 1

# Note the persisting series with rows containing `NaN` values in df2
In [12]: df2['CC'] = [111, 222, 333]
         df2
Out[12]:    AA  BB   CC
         0 NaN NaN  111
         1 NaN NaN  222
         2 NaN NaN  333
In [13]: len(df2.columns)
Out[13]: 3

It is evident that the original columns in df2 have re-surfaced. Therefore, it is prudent to instead read the length of the columns index with len(pandas.core.frame.DataFrame.columns) to see if a dataframe is empty.

Practical solution

# New dataframe df
In [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
        df
Out[1]:    AA  BB
        0   1  11
        1   2  22
        2   3  33

# This data manipulation approach results in an empty df
# because of a subset of values that are not available (`NaN`)
In [2]: df = df[df['AA'] == 5]
        df
Out[2]: Empty DataFrame
        Columns: [AA, BB]
        Index: []

# NOTE: the df is empty, BUT the columns are persistent
In [3]: len(df.columns)
Out[3]: 2

# And accordingly, the other answers on this page
In [4]: len(df.index)  # or len(df)
Out[4]: 0

In [5]: df.empty
Out[5]: True

# SOLUTION: conditionally check for empty columns
In [6]: if len(df.columns) != 0:  # <--- here
            # Do something, e.g. 
            # drop any columns containing rows with `NaN`
            # to make the df really empty
            df = df.dropna(how='all', axis=1)
        df
Out[6]: Empty DataFrame
        Columns: []
        Index: []

# Testing shows it is indeed empty now
In [7]: len(df.columns)
Out[7]: 0

Adding a new data series works as expected without the re-surfacing of empty columns (factually, without any series that were containing rows with only NaN):

In [8]: df['CC'] = [111, 222, 333]
         df
Out[8]:    CC
         0 111
         1 222
         2 333
In [9]: len(df.columns)
Out[9]: 1

0 讨论(0)

查看其它5个回答