How to flatten a pandas dataframe with some columns as json?

自作多情 提交于 2019-11-28 18:46:56

Here's a solution using json_normalize() again by using a custom function to get the data in the correct format understood by json_normalize function.

import ast
from pandas.io.json import json_normalize

def only_dict(d):
    '''
    Convert json string representation of dictionary to a python dict
    '''
    return ast.literal_eval(d)

def list_of_dicts(ld):
    '''
    Create a mapping of the tuples formed after 
    converting json strings of list to a python list   
    '''
    return dict([(list(d.values())[1], list(d.values())[0]) for d in ast.literal_eval(ld)])

A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.')
B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.') 

Finally, join the DFs on the common index to get:

df[['id', 'name']].join([A, B])


EDIT:- As per the comment by @MartijnPieters, the recommended way of decoding the json strings would be to use json.loads() which is much faster when compared to using ast.literal_eval() if you know that the data source is JSON.

create a custom function to flatten columnB then use pd.concat

def flatten(js):
    return pd.DataFrame(js).set_index('pos').squeeze()

pd.concat([df.drop(['columnA', 'columnB'], axis=1),
           df.columnA.apply(pd.Series),
           df.columnB.apply(flatten)], axis=1)

The quickest seems to be:

json_struct = json.loads(df.to_json(orient="records"))
df_flat = pf.io.json.json_normalize(json_struct)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!