Accessing a Pandas index like a regular column

半城伤御伤魂 提交于 2021-02-18 09:54:28

问题


I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this:

import pandas as pd, numpy as np

df=pd.DataFrame({'name':map(chr, range(97, 102)), 'id':range(10000,10005), 'value':np.random.randn(5)})
df.set_index('name', inplace=True)

Here's the result:

         id     value
name                 
a     10000  0.659710
b     10001  1.001821
c     10002 -0.197576
d     10003 -0.569181
e     10004 -0.882097

Now how am I allowed to go about accessing the name column?

print(df.index)  # No problem
print(df['name'])  # KeyError: u'name'

I know there are workaround like duplicating the column or changing the index to something else. But is there something cleaner, like some form of column access that treats the index the same way as everything else?


回答1:


Index has a special meaning in Pandas. It's used to optimise specific operations and can be used in various methods such as merging / joining data. Therefore, make a choice:

  • If it's "just another column", use reset_index and treat it as another column.
  • If it's genuinely used for indexing, keep it as an index and use df.index.

We can't make this choice for you. It should be dependent on the structure of your underlying data and on how you intend to analyse your data.

For more information on use of a dataframe index, see:

  • What is the performance impact of non-unique indexes in pandas?
  • What is the point of indexing in pandas?



回答2:


Instead of using reset_index, you could just copy the index to a normal column, do some work and then drop the column, for example:

df['tmp'] = df.index
# do stuff based on df['tmp']
del df['tmp']



回答3:


You could also use df.index.get_level_values if you need to access a (index) column by name. It also works with hierarchical indices (MultiIndex).

>>> df.index.get_level_values('name')
Index(['a', 'b', 'c', 'd', 'e'], dtype='object', name='name')


来源:https://stackoverflow.com/questions/52139506/accessing-a-pandas-index-like-a-regular-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!