Convert a column of json strings into columns of data

前端 未结 3 888
夕颜
夕颜 2020-12-12 01:06

I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this

3条回答
  •  -上瘾入骨i
    2020-12-12 01:49

    Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Method if that isn't the case)

    One way to do that is to apply a function to the column

    # constructing the df
    df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])
    
    # print(df)
                             json
    # 0  0 {"a":"1","b":"2","c":"3"}
    # 1  1 {"a" :"4","b":"5","c":"6"}
    
    # function to remove the number
    import re
    
    def split_num(val):
        p = re.compile("({.*)")
        return p.search(val).group(1)
    
    # applying the function
    df['json'] = df['json'].map(lambda x: split_num(x))
    print(df)
    
    #                          json
    # 0   {"a":"1","b":"2","c":"3"}
    # 1  {"a" :"4","b":"5","c":"6"}
    

    Method:

    Once the df is in the above format, the below will convert each row entry to a dictionary:

    df['json'] = df['json'].map(lambda x: dict(eval(x)))
    

    Then, applying pd.Series to the column will do the job

    d = df['json'].apply(pd.Series)
    print(d)
    #   a  b  c
    # 0  1  2  3
    # 1  4  5  6
    

提交回复
热议问题