Multi-dimensional/Nested DataFrame/Dataset/Panel in Pandas

前端 未结 2 713
星月不相逢
星月不相逢 2021-01-14 11:42

I would like to store some multidimensional data in a pandas dataframe or panel such that I would like to be able to return for example:

  1. All the times for Run
2条回答
  •  深忆病人
    2021-01-14 12:19

    I think you can use Multiindex and then select data by slicers:

    import pandas as pd
    
    df = pd.DataFrame({'Time': {('Runner A', 'Male', 35, 'Race A', 2014): '2:47:34', ('Runner C', 'Female', 32, 'Race B', 1998): '1:29:43', ('Runner B', 'Male', 29, 'Race A', 2015): '3:05:56', ('Runner A', 'Male', 35, 'Race A', 2013): '2:50:12', ('Runner A', 'Male', 35, 'Race B', 2013): '1:32:07', ('Runner A', 'Male', 35, 'Race A', 2015): '2:35:09'}})
    print (df)
                                       Time
    Runner A Male   35 Race A 2013  2:50:12
                              2014  2:47:34
                              2015  2:35:09
                       Race B 2013  1:32:07
    Runner B Male   29 Race A 2015  3:05:56
    Runner C Female 32 Race B 1998  1:29:43
    
    #index has to be fully lexsorted
    df.sort_index(inplace=True)
    print (df)
                                       Time
    Runner A Male   35 Race A 2013  2:50:12
                              2014  2:47:34
                              2015  2:35:09
                       Race B 2013  1:32:07
    Runner B Male   29 Race A 2015  3:05:56
    Runner C Female 32 Race B 1998  1:29:43
    
    idx = pd.IndexSlice
    print (df.loc[idx['Runner A',:,:,'Race A',:],:])
                                     Time
    Runner A Male 35 Race A 2013  2:50:12
                            2014  2:47:34
                            2015  2:35:09
    
    print (df.loc[idx[:,:,:,'Race A',2015],:])
                                     Time
    Runner A Male 35 Race A 2015  2:35:09
    Runner B Male 29 Race A 2015  3:05:56
    

提交回复
热议问题