Speed difference between bracket notation and dot notation for accessing columns in pandas

前端 未结 1 1871
北荒
北荒 2020-12-10 05:42

Let\'s have a small dataframe: df = pd.DataFrame({\'CID\': [1,2,3,4,12345, 6]})

When I search for membership the speed is vastly different based on whet

相关标签:
1条回答
  • 2020-12-10 06:07

    df['CID'] delegates to NDFrame.__getitem__ and it is more obvious you are performing an indexing operation.

    On the other hand, df.CID delegates to NDFrame.__getattr__, which has to do some additional heavy lifting, mainly to determine whether 'CID' is an attribute, a function, or a column you're calling using the attribute access (a convenience, but not recommended for production code).


    Now, why is it not recommended? Consider,

    df = pd.DataFrame({'A': [1, 2, 3]})
    df.A
    
    0    1
    1    2
    2    3
    Name: A, dtype: int64
    

    There are no issues referring to column "A" as df.A, because it does not conflict with any attribute or function namings in pandas. However, consider the pop function (just as an example).

    df.pop
    # <bound method NDFrame.pop of ...>
    

    df.pop is a bound method of df. Now, I'd like to create a column called "pop" for various reasons.

    df['pop'] = [4, 5, 6]
    df
       A  pop
    0  1    4
    1  2    5
    2  3    6
    

    Great, but,

    df.pop
    # <bound method NDFrame.pop of ...>
    

    I cannot use the attribute notation to access this column. However...

    df['pop']
    
    0    4
    1    5
    2    6
    Name: pop, dtype: int64
    

    Bracket notation still works. That's why this is better.

    0 讨论(0)
提交回复
热议问题