pandas access axis by user-defined name

问题

I am wondering whether there is any way to access axes of pandas containers (DataFrame, Panel, etc...) by user-defined name instead of integer or "index", "columns", "minor_axis" etc...

For example, with the following data container:

df = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3'])
df.index.name = 'myaxis1'
df.columns.name = 'myaxis2'

I would like to do this:

df.sum(axis='myaxis1') 
df.xs('c1', axis='myaxis2')  # cross section

Also very useful would be:

df.reshape(['myaxis2','myaxis1'])

(in this case not so relevant, but it could become so if the dimension increases)

The reason is that I work a lot with multi-dimensional arrays of varying dimensions, like "time", "variable", "percentile" etc...and a same piece of code is often applied to objects which can be DataFrame, Panel or even Panel4D or DataFrame with MultiIndex. For now I often make test on the shape of the object, or on the general settings of the script in order to know which axis is the relevant one to compute a sum or mean. But I think it would be much more convenient to forget about how the container is implemented in the detail (DataFrame, Panel etc...), and simply think about the nature of the problem (say I want to average over the time, I do not want to think about whether I work with in "probabilistic" mode with several percentiles, or in "deterministic" mode with a single time series).

Writing this post I have (re)discovered the very useful axes attribute. The above code could be translated into:

nms = [ax.name for ax in df.axes]
axid1 = nms.index('myaxis1')
axid2 = nms.index('myaxis2')
df.sum(axis=axid1) 
df.xs('c1', axis=axid2)  # cross section

and the "reshape" feature (does not apply to 3-d case though...):

newshape = ['myaxis2','myaxis1']
axid = [nms.index(nm) for nm in newshape]
df.swapaxes(*axid)

Well, I have to admit that I have found these solutions while writing this post (and this is already very convenient), but it could be generalized to account for DataFrame (or other) with MultiIndex axes, do a search on all axes and labels...

In my opinion it would be a major improvement to the user-friendliness of pandas (ok, forgetting about the actual structure could have a performance cost, but the user worried about performance can be careful in how he/she organizes the data).

What do you think?

回答1:

This is still experimental, but look at this page:

http://pandas.pydata.org/pandas-docs/dev/dsintro.html#panelnd-experimental

import pandas
import numpy as np

from pandas.core import panelnd

MyPanel4D = panelnd.create_nd_panel_factory(
    klass_name   = 'MyPanel4D',
    axis_orders  = ['axis4', 'axis3', 'axis2', 'axis1'],
    axis_slices  = {'axis3': 'items',
                    'axis2': 'major_axis',
                    'axis1': 'minor_axis'},
    slicer       = 'Panel',
    stat_axis=2) 
mp4d = MyPanel4D(np.random.rand(5,4,3,2))
print mp4d

Results in this

<class 'pandas.core.panelnd.MyPanel4D'>
Dimensions: 5 (axis4) x 4 (axis3) x 3 (axis2) x 2 (axis1)
Axis4 axis: 0 to 4
Axis3 axis: 0 to 3
Axis2 axis: 0 to 2
Axis1 axis: 0 to 1

Here's the caveat, when you slice it like mp4d[0] you are going to get back a Panel, unless you create a hierarchy of custom objects (unfortunately will need to wait for 0.12-dev for support for 'renaming' Panel/DataFrame, its non-trivial and haven't had any requests)

So for higher dim objects you can impose your own name structure. The axis aliasing should work like you are suggesting, but I think there are some bugs there

来源：https://stackoverflow.com/questions/15533093/pandas-access-axis-by-user-defined-name

标签

pandas

axis