multiindex selecting in pandas

一个人想着一个人 提交于 2021-02-10 03:15:59

问题


I have problems understanding multiindex selecting in pandas.

                    0  1  2  3
first second third            
C     one    mean   3  4  2  7
             std    4  1  7  7
      two    mean   3  1  4  7
             std    5  6  7  0
      three  mean   7  0  2  5
             std    7  3  7  1
H     one    mean   2  4  3  3
             std    5  5  3  5
      two    mean   5  7  0  6
             std    0  1  0  2
      three  mean   5  2  5  1
             std    9  0  4  6
V     one    mean   3  7  3  9
             std    8  7  9  3
      two    mean   1  9  9  0
             std    1  1  5  1
      three  mean   3  1  0  6
             std    6  2  7  4

I need to create new rows:

- 'CH' : ['CH',:,'mean'] => ['C',:,'mean'] - ['H',:,'mean']
- 'CH' : ['CH',:,'std'] => (['C',:,'std']**2 + ['H',:,'std']**2)**.5

When trying to select rows I get different types of errors: UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'

How should be performed this operation?

import pandas as pd
import numpy as np
iterables = [['C', 'H', 'V'],
          ['one','two','three'],
          ['mean','std']]
midx = pd.MultiIndex.from_product(iterables, names=['first', 'second','third'])
chv = pd.DataFrame(np.random.randint(0,high=10,size=(18,4)), index=midx)
print (chv)
idx = pd.IndexSlice
chv.loc[:,idx['C',:,'mean']]

回答1:


You can filter by slicers first, then rename first level and use arithmetic operations, last concat together:

#avoid UnsortedIndexError
df = df.sort_index()

idx = pd.IndexSlice
c1 = chv.loc[idx['C',:,'mean'], :].rename({'C':'CH'}, level=0)
h1 = chv.loc[idx['H',:,'mean'], :].rename({'H':'CH'}, level=0)
ch1 = c1 - h1

c2 = chv.loc[idx['C',:,'std'], :].rename({'C':'CH'}, level=0)**2
h2 = chv.loc[idx['H',:,'std'], :].rename({'H':'CH'}, level=0)**2
ch2 = (c2 + h2)**.5

df = pd.concat([chv, ch1, ch2]).sort_index()

print (df)
                           0         1         2         3
first second third                                        
C     one    mean   7.000000  5.000000  8.000000  3.000000
             std    0.000000  4.000000  4.000000  4.000000
      three  mean   4.000000  2.000000  1.000000  6.000000
             std    8.000000  7.000000  3.000000  3.000000
      two    mean   1.000000  8.000000  2.000000  5.000000
             std    2.000000  2.000000  4.000000  2.000000
CH    one    mean   1.000000  2.000000  1.000000  2.000000
             std    4.000000  7.211103  4.000000  7.211103
      three  mean   1.000000  0.000000 -4.000000  2.000000
             std    8.062258  7.071068  4.242641  3.000000
      two    mean  -1.000000  6.000000 -2.000000  3.000000
             std    9.219544  7.280110  4.123106  2.000000
H     one    mean   6.000000  3.000000  7.000000  1.000000
             std    4.000000  6.000000  0.000000  6.000000
      three  mean   3.000000  2.000000  5.000000  4.000000
             std    1.000000  1.000000  3.000000  0.000000
      two    mean   2.000000  2.000000  4.000000  2.000000
             std    9.000000  7.000000  1.000000  0.000000
V     one    mean   9.000000  5.000000  0.000000  5.000000
             std    7.000000  9.000000  1.000000  1.000000
      three  mean   3.000000  0.000000  3.000000  4.000000
             std    1.000000  4.000000  9.000000  2.000000
      two    mean   3.000000  6.000000  3.000000  2.000000
             std    1.000000  3.000000  1.000000  4.000000


来源:https://stackoverflow.com/questions/49875793/multiindex-selecting-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!