Let\'s have a small dataframe: df = pd.DataFrame({\'CID\': [1,2,3,4,12345, 6]})
When I search for membership the speed is vastly different based on whet
df['CID']
delegates to NDFrame.__getitem__ and it is more obvious you are performing an indexing operation.
On the other hand, df.CID
delegates to NDFrame.__getattr__, which has to do some additional heavy lifting, mainly to determine whether 'CID' is an attribute, a function, or a column you're calling using the attribute access (a convenience, but not recommended for production code).
Now, why is it not recommended? Consider,
df = pd.DataFrame({'A': [1, 2, 3]})
df.A
0 1
1 2
2 3
Name: A, dtype: int64
There are no issues referring to column "A" as df.A
, because it does not conflict with any attribute or function namings in pandas. However, consider the pop function (just as an example).
df.pop
# <bound method NDFrame.pop of ...>
df.pop
is a bound method of df
. Now, I'd like to create a column called "pop" for various reasons.
df['pop'] = [4, 5, 6]
df
A pop
0 1 4
1 2 5
2 3 6
Great, but,
df.pop
# <bound method NDFrame.pop of ...>
I cannot use the attribute notation to access this column. However...
df['pop']
0 4
1 5
2 6
Name: pop, dtype: int64
Bracket notation still works. That's why this is better.