How to split a list of dictionaries into multiple columns keeping the same index?

拥有回忆 提交于 2021-02-04 21:31:10

问题


I have a data frame that has as index a timestamp and a column that has list of dictionaries:

    index                   var_A

    2019-08-21 09:05:49    [{"Date1": "Aug 21, 2017 9:09:51 AM","Date2": "Aug 21, 2017 9:09:54 AM","Id": "d5e665e5","num_ins": 108,"num_del": 0, "time": 356} , {"Date1": "Aug 21, 2017 9:09:57 AM","Date2": "Aug 21, 2017 9:09:59 AM","Id": "d5e665e5","num_ins": 218,"num_del": 5, "time": 166}]
    2019-08-21 09:05:59    [{"Date1": "Aug 21, 2017 9:10:01 AM","Date2": "Aug 21, 2017 9:11:54 AM","Id": "d5e665e5","num_ins": 348,"num_del": 72, "time": 3356} , {"Date1": "Aug 21, 2017 9:19:57 AM","Date2": "Aug 21, 2017 9:19:59 AM","Id": "d5e665e5","num_ins": 69,"num_del": 5, "time": 125}, {"Date1": "Aug 21, 2017 9:20:01 AM","Date2": "Aug 21, 2017 9:21:54 AM","Id": "f9e775f9","num_ins": 470,"num_del": 0, "time": 290} ]
    2019-08-21 09:06:04    []

What I wish to achieve is a dataframe like:

    index              Date1                      Date2                    Id      num_ins       num_del    time
2019-08-21 09:05:49   Aug 21, 2017 9:09:51AM   Aug 21, 2017 9:09:54AM   d5e665e5      0           108        356
2019-08-21 09:05:49   Aug 21, 2017 9:09:57AM   Aug 21, 2017 9:09:59AM   d5e665e5      218           5        166
2019-08-21 09:05:59   Aug 21, 2017 9:10:01AM   Aug 21, 2017 9:11:54AM   d5e665e5      348          72       3356
2019-08-21 09:05:59   Aug 21, 2017 9:19:57AM   Aug 21, 2017 9:19:59AM   d5e665e5      69            5        125
2019-08-21 09:05:59   Aug 21, 2017 9:20:01AM   Aug 21, 2017 9:21:54AM   f9e775f9      470           0        290
2019-08-21 09:06:04     NAN                         NAN                    NAN        NAN         NAN        NAN

回答1:


Loop by each value with enumerate, because duplicated inex values and create DataFrames, then create DataFrame for empty lists and last concat together:

import ast

out = {}
for i, (k, v) in enumerate(df['var_A'].items()):
    df = pd.DataFrame(v)
    if df.empty:
        out[(i, k)] = pd.DataFrame(index=[0], columns=['Id'])
    else:
        out[(i, k)] = df

df = pd.concat(out, sort=True).reset_index(level=[0,2], drop=True)
print (df)
                                       Date1                    Date2  \
2019-08-21 09:05:49  Aug 21, 2017 9:09:51 AM  Aug 21, 2017 9:09:54 AM   
2019-08-21 09:05:49  Aug 21, 2017 9:09:57 AM  Aug 21, 2017 9:09:59 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:10:01 AM  Aug 21, 2017 9:11:54 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:19:57 AM  Aug 21, 2017 9:19:59 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:20:01 AM  Aug 21, 2017 9:21:54 AM   
2019-08-21 09:05:59                      NaN                      NaN   

                           Id  num_del  num_ins    time  
2019-08-21 09:05:49  d5e665e5      0.0    108.0   356.0  
2019-08-21 09:05:49  d5e665e5      5.0    218.0   166.0  
2019-08-21 09:05:59  d5e665e5     72.0    348.0  3356.0  
2019-08-21 09:05:59  d5e665e5      5.0     69.0   125.0  
2019-08-21 09:05:59  f9e775f9      0.0    470.0   290.0  
2019-08-21 09:05:59       NaN      NaN      NaN     NaN  



回答2:


You can use pandas functions stack and concat to do this.

  1. First use stack to unlist the list the column var_A
  2. Then use concat to unnest the dictionary and put it into seperate columns

You can use the following code to do the same. Assuming your dictionary to be df.

Unlist:

df = df.apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)

Unnest:

df = pd.concat([df.drop('var_A', axis=1), df['var_A'].apply(pd.Series)], axis=1).drop(0,1)



来源:https://stackoverflow.com/questions/57604212/how-to-split-a-list-of-dictionaries-into-multiple-columns-keeping-the-same-index

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!