High-dimensional data structure in Python

…衆ロ難τιáo~ 提交于 2019-12-05 13:38:22
Stefan

MultiIndex is most useful for higher dimensional data as explained in the docs and this SO answer because it allows you to work with any number of dimension in a DataFrame environment.

In addition to the Panel, there is also Panel4D - currently in experimental stage. Given the advantages of MultiIndex I wouldn't recommend using either this or the three dimensional version. I don't think these data structures have gained much traction in comparison, and will indeed be phased out.

If you need labelled arrays and pandas-like smart indexing, you can use xarray package which is essentially an n-dimensional extension of pandas Panel (panels are being deprecated in pandas in future in favour of xarray).

Otherwise, it may sometimes be reasonable to use plain numpy arrays which can be of any dimensionality; you can also have arbitrarily nested numpy record arrays of any dimension.

I recommend continuing to use DataFrame but utilize the MultiIndex feature. DataFrame is better supported and you preserve all of your dimensionality with the MultiIndex.

Example

df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'], index=['A', 'B'])

df3 = pd.concat([df for _ in [0, 1]], keys=['one', 'two'])

df4 = pd.concat([df3 for _ in [0, 1]], axis=1, keys=['One', 'Two'])

print df4

Looks like:

      One    Two   
        a  b   a  b
one A   1  2   1  2
    B   3  4   3  4
two A   1  2   1  2
    B   3  4   3  4

This is a hyper-cube of data. And you'll be much better served with support and questions and less bugs and many other benefits.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!