The difference between double brace `[[…]]` and single brace `[..]` indexing in Pandas

后端 未结 3 1617
梦如初夏
梦如初夏 2020-12-03 10:53

I\'m confused about the syntax regarding the following line of code:

x_values = dataframe[[\'Brains\']]

The dataframe object consists of 2

相关标签:
3条回答
  • 2020-12-03 11:02

    There is no special syntax in Python for [[ and ]]. Rather, a list is being created, and then that list is being passed as an argument to the DataFrame indexing function.

    As per @MaxU's answer, if you pass a single string to a DataFrame a series that represents that one column is returned. If you pass a list of strings, then a DataFrame that contains the given columns is returned.

    So, when you do the following

    # Print "Brains" column as Series
    print(df['Brains'])
    # Return a DataFrame with only one column called "Brains"
    print(df[['Brains']])
    

    It is equivalent to the following

    # Print "Brains" column as Series
    column_to_get = 'Brains'
    print(df[column_to_get])
    # Return a DataFrame with only one column called "Brains"
    subset_of_columns_to_get = ['Brains']
    print(df[subset_of_columns_to_get])
    

    In both cases, the DataFrame is being indexed with the [] operator.

    Python uses the [] operator for both indexing and for constructing list literals, and ultimately I believe this is your confusion. The outer [ and ] in df[['Brains']] is performing the indexing, and the inner is creating a list.

    >>> some_list = ['Brains']
    >>> some_list_of_lists = [['Brains']]
    >>> ['Brains'] == [['Brains']][0]
    True
    >>> 'Brains' == [['Brains']][0][0] == [['Brains'][0]][0]
    True
    

    What I am illustrating above is that at no point does Python ever see [[ and interpret it specially. In the last convoluted example ([['Brains'][0]][0]) there is no special ][ operator or ]][ operator... what happens is

    • A single-element list is created (['Brains'])
    • The first element of that list is indexed (['Brains'][0] => 'Brains')
    • That is placed into another list ([['Brains'][0]] => ['Brains'])
    • And then the first element of that list is indexed ([['Brains'][0]][0] => 'Brains')
    0 讨论(0)
  • 2020-12-03 11:02

    Other solutions demonstrate the difference between a series and a dataframe. For the Mathematically minded, you may wish to consider the dimensions of your input and output. Here's a summary:

    Object                                Series          DataFrame
    Dimensions (obj.ndim)                      1                  2
    Syntax arg dim                             0                  1
    Syntax                             df['col']        df[['col']]
    Max indexing dim                           1                  2
    Label indexing              df['col'].loc[x]   df.loc[x, 'col']
    Label indexing (scalar)      df['col'].at[x]    df.at[x, 'col']
    Integer indexing           df['col'].iloc[x]  df.iloc[x, 'col']
    Integer indexing (scalar)   df['col'].iat[x]   dfi.at[x, 'col']
    

    When you specify a scalar or list argument to pd.DataFrame.__getitem__, for which [] is syntactic sugar, the dimension of your argument is one less than the dimension of your result. So a scalar (0-dimensional) gives a 1-dimensional series. A list (1-dimensional) gives a 2-dimensional dataframe. This makes sense since the additional dimension is the dataframe index, i.e. rows. This is the case even if your dataframe happens to have no rows.

    0 讨论(0)
  • 2020-12-03 11:03

    Consider this:

    Source DF:

    In [79]: df
    Out[79]:
       Brains  Bodies
    0      42      34
    1      32      23
    

    Selecting one column - results in Pandas.Series:

    In [80]: df['Brains']
    Out[80]:
    0    42
    1    32
    Name: Brains, dtype: int64
    
    In [81]: type(df['Brains'])
    Out[81]: pandas.core.series.Series
    

    Selecting subset of DataFrame - results in DataFrame:

    In [82]: df[['Brains']]
    Out[82]:
       Brains
    0      42
    1      32
    
    In [83]: type(df[['Brains']])
    Out[83]: pandas.core.frame.DataFrame
    

    Conclusion: the second approach allows us to select multiple columns from the DataFrame. The first one just for selecting single column...

    Demo:

    In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef'))
    
    In [85]: df
    Out[85]:
              a         b         c         d         e         f
    0  0.065196  0.257422  0.273534  0.831993  0.487693  0.660252
    1  0.641677  0.462979  0.207757  0.597599  0.117029  0.429324
    2  0.345314  0.053551  0.634602  0.143417  0.946373  0.770590
    3  0.860276  0.223166  0.001615  0.212880  0.907163  0.437295
    4  0.670969  0.218909  0.382810  0.275696  0.012626  0.347549
    
    In [86]: df[['e','a','c']]
    Out[86]:
              e         a         c
    0  0.487693  0.065196  0.273534
    1  0.117029  0.641677  0.207757
    2  0.946373  0.345314  0.634602
    3  0.907163  0.860276  0.001615
    4  0.012626  0.670969  0.382810
    

    and if we specify only one column in the list we will get a DataFrame with one column:

    In [87]: df[['e']]
    Out[87]:
              e
    0  0.487693
    1  0.117029
    2  0.946373
    3  0.907163
    4  0.012626
    
    0 讨论(0)
提交回复
热议问题