Unfold a nested dictionary with lists into a pandas DataFrame

后端 未结 2 1642
陌清茗
陌清茗 2021-01-18 18:27

I have a nested dictionary, whereby the sub-dictionary use lists:

nested_dict = {\'string1\': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    `string2` :{2         


        
2条回答
  •  终归单人心
    2021-01-18 19:01

    This should give you the result you are looking for, although it's probably not the most elegant solution. There's probably a better (more pandas way) to do it.

    I parsed your nested dict and built a list of dictionaries (one for each row).

    # some sample input
    nested_dict = {
        'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
        'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
        'string3' :{28673: [83, 24], 22737:[83, 94, 1103, 103], 19424: [65, 24]}
    }
    
    # new list is what we will use to hold each row
    new_list = []
    for k1 in nested_dict:
        curr_dict = nested_dict[k1]
        for k2 in curr_dict:
            new_dict = {'col1': k1, 'col2': k2}
            new_dict.update({'col%d'%(i+3): curr_dict[k2][i] for i in range(len(curr_dict[k2]))})
            new_list.append(new_dict)
    
    # create a DataFrame from new list
    df = pd.DataFrame(new_list)
    

    The output:

          col1   col2  col3  col4    col5   col6
    0  string2  28672    82    23     NaN    NaN
    1  string2  22736    82    93  1102.0  102.0
    2  string2  19423    64    23     NaN    NaN
    3  string3  19424    65    24     NaN    NaN
    4  string3  28673    83    24     NaN    NaN
    5  string3  22737    83    94  1103.0  103.0
    6  string1     65     1     1     NaN    NaN
    7  string1     67   682    12     NaN    NaN
    8  string1     69  1231   232     NaN    NaN
    

    There is an assumption that the input will always contain enough data to create a col1 and a col2.

    I loop through nested_dict. It is assumed that each element of nested_dict is also a dictionary. We loop through that dictionary as well (curr_dict). The keys k1 and k2 are used to populate col1 and col2. For the rest of the keys, we iterate through the list contents and add a column for each element.

提交回复
热议问题