I want to flatten JSON column in a Pandas DataFrame

后端 未结 2 1653
时光取名叫无心
时光取名叫无心 2020-12-16 06:39

I have an input dataframe df which is as follows:

id  e
1   {\"k1\":\"v1\",\"k2\":\"v2\"}
2   {\"k1\":\"v3\",\"k2\":\"v4\"}
3   {\"k1\":\"v5\",\"k2\":\"v6\"}         


        
相关标签:
2条回答
  • 2020-12-16 07:02

    If your column is not already a dictionary, you could use map(json.loads) and apply pd.Series:

    s = df['e'].map(json.loads).apply(pd.Series).add_prefix('e.')
    

    Or if it is already a dictionary, you can apply pd.Series directly:

    s = df['e'].apply(pd.Series).add_prefix('e.')
    

    Finally use pd.concat to join back the other columns:

    >>> pd.concat([df.drop(['e'], axis=1), s], axis=1).set_index('id')    
    id e.k1 e.k2
    1    v1   v2
    2    v3   v4
    3    v5   v6
    
    0 讨论(0)
  • 2020-12-16 07:18

    Here is a way to use pandas.io.json.json_normalize():

    from pandas.io.json import json_normalize
    df = df.join(json_normalize(df["e"].tolist()).add_prefix("e.")).drop(["e"], axis=1)
    print(df)
    #  e.k1 e.k2
    #0   v1   v2
    #1   v3   v4
    #2   v5   v6
    

    However, if you're column is actually a str and not a dict, then you'd first have to map it using json.loads():

    import json
    df = df.join(json_normalize(df['e'].map(json.loads).tolist()).add_prefix('e.'))\
        .drop(['e'], axis=1)
    
    0 讨论(0)
提交回复
热议问题