Convert a column of json strings into columns of data

前端 未结 3 886
夕颜
夕颜 2020-12-12 01:06

I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this

相关标签:
3条回答
  • 2020-12-12 01:32
    with open(json_file) as f:
        df = pd.DataFrame(json.loads(line) for line in f)
    
    0 讨论(0)
  • 2020-12-12 01:46

    If you are using dataframes in pandas, you can use one of library functions known as from_dict which creates a dataframe from a dictionary.

    If your data is json, you can convert that into a dict quite easily using the json library.

    import json
    import pandas 
    
    my_dict = json.loads({"a" ;"4","b":"5","c":"6"})
    pandas.DataFrame.from_dict(my_dict)
    

    You can apply this logic to your rows.

    0 讨论(0)
  • 2020-12-12 01:49

    Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Method if that isn't the case)

    One way to do that is to apply a function to the column

    # constructing the df
    df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])
    
    # print(df)
                             json
    # 0  0 {"a":"1","b":"2","c":"3"}
    # 1  1 {"a" :"4","b":"5","c":"6"}
    
    # function to remove the number
    import re
    
    def split_num(val):
        p = re.compile("({.*)")
        return p.search(val).group(1)
    
    # applying the function
    df['json'] = df['json'].map(lambda x: split_num(x))
    print(df)
    
    #                          json
    # 0   {"a":"1","b":"2","c":"3"}
    # 1  {"a" :"4","b":"5","c":"6"}
    

    Method:

    Once the df is in the above format, the below will convert each row entry to a dictionary:

    df['json'] = df['json'].map(lambda x: dict(eval(x)))
    

    Then, applying pd.Series to the column will do the job

    d = df['json'].apply(pd.Series)
    print(d)
    #   a  b  c
    # 0  1  2  3
    # 1  4  5  6
    
    0 讨论(0)
提交回复
热议问题