How to flatten a pandas dataframe with some columns as json?

后端 未结 4 884
日久生厌
日久生厌 2020-12-04 18:32

I have a dataframe df that loads data from a database. Most of the columns are json strings while some are even list of jsons. For example:

id           


        
4条回答
  •  春和景丽
    2020-12-04 18:35

    Here's a solution using json_normalize() again by using a custom function to get the data in the correct format understood by json_normalize function.

    import ast
    from pandas.io.json import json_normalize
    
    def only_dict(d):
        '''
        Convert json string representation of dictionary to a python dict
        '''
        return ast.literal_eval(d)
    
    def list_of_dicts(ld):
        '''
        Create a mapping of the tuples formed after 
        converting json strings of list to a python list   
        '''
        return dict([(list(d.values())[1], list(d.values())[0]) for d in ast.literal_eval(ld)])
    
    A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.')
    B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.') 
    

    Finally, join the DFs on the common index to get:

    df[['id', 'name']].join([A, B])
    


    EDIT:- As per the comment by @MartijnPieters, the recommended way of decoding the json strings would be to use json.loads() which is much faster when compared to using ast.literal_eval() if you know that the data source is JSON.

提交回复
热议问题