How and When to use Chain Indexing in Python Pandas?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-13 02:51:06

问题


I am taking a Data Science course about data analysis in Python. At one point in the course the professor says:

You can chain operations together. For instance, we could have rewritten the query for all Store 1 costs as df.loc['Store 1']['Cost']. This looks pretty reasonable and gets us the result we wanted. But chaining can come with some costs and is best avoided if you can use another approach. In particular, chaining tends to cause Pandas to return a copy of the DataFrame instead of a view on the DataFrame. For selecting data, this is not a big deal, though it might be slower than necessary. If you are changing data though, this is an important distinction and can be a source of error.

Later on, he describes chain indexing as:

Generally bad, pandas could return a copy of a view depending upon NumPy

So, he suggests using multi-axis indexing (df.loc['a', '1']).

I'm wondering whether it is always advisable to stay clear of chain indexing or are there specific uses cases for it where it shines?

Also, if it is true that it can return a copy of a view or a view (depending upon NumPy), what exactly does it depend on and can I influence it to get the desired outcome?

I've found this answer that states:

When you use df['1']['a'], you are first accessing the series object s = df['1'], and then accessing the series element s['a'], resulting in two __getitem__ calls, both of which are heavily overloaded (handle a lot of scenarios, like slicing, boolean mask indexing, and so on).

...which makes it seem chain indexing is always bad. Thoughts?

来源:https://stackoverflow.com/questions/56774184/how-and-when-to-use-chain-indexing-in-python-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!