How do I create a DataFrame with multi-level columns?

人走茶凉 提交于 2021-01-29 09:25:26

问题


An existing question, Creating a Pandas Dataframe with Multi Column Index, deals with a very "regular" DataFrame where all columns and rows are products and all data is present.

My situation is, alas, different. I have this kind of data:

[{"street": "Euclid", "house":42, "area":123, (1,"bedrooms"):1, (1,"bathrooms"):4},
 {"street": "Euclid", "house":19, "area":234, (2,"bedrooms"):3, (2,"bathrooms"):3},
 {"street": "Riemann", "house":42, "area":345, (1,"bedrooms"):5,
  (1,"bathrooms"):2, (2,"bedrooms"):12, (2, "bathrooms"):17},
 {"street": "Riemann", "house":19, "area":456, (1,"bedrooms"):7, (1,"bathrooms"):1}]

and I want this sort of DataFrame with both rows and columns having multi-level indexes:

              area          1                  2
street house        bedrooms bathrooms bedrooms bathrooms
Euclid  42    123     1         4
Euclid  19    234                         3         3
Riemann 42    345     5         2        12        17
Riemann 19    456     7         1

So, the row index should be

MultiIndex([("Euclid",42),("Euclid",19),("Riemann",42),("Riemann",19)],
           names=["street","house"])

and the columns index should be

MultiIndex([("area",None),(1,"bedrooms"),(1,"bathrooms"),(2,"bedrooms"),(2,"bathrooms")],
           names=["floor","entity"])

and I see no way to generate these indexes from the list of dictionaries I have.


回答1:


i feel there should be something better than this; hopefully someone on SO puts out sth much better:

Create a function to process each entry in the dictionary:

def process(entry):
    #read in data and get the keys to be the column names
    m = pd.DataFrame.from_dict(entry,orient='index').T
    #set index
    m = m.set_index(['street','house'])
    #create multi-index columns
    col1 = [ent[0] if isinstance(ent,tuple) else ent for ent in m.columns ]
    col2 = [ent[-1] if isinstance(ent,tuple) else None for ent in m.columns ]
    #assign multi-index column to m
    m.columns=[col1,col2]
    return m

Apply function above to data(i wrapped the dictionary into the data variable):

res = [process(entry) for entry in data]

concatenate to get final output

pd.concat(res)

                area               1                  2
                NaN    bedrooms bathrooms   bedrooms    bathrooms
street  house                   
Euclid    42    123     1        4           NaN         NaN
          19    234     NaN      NaN         3           3
Riemann   42    345     5        2           12          17
          19    456     7        1           NaN         NaN


来源:https://stackoverflow.com/questions/61181247/how-do-i-create-a-dataframe-with-multi-level-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!