Build dict from list of tuples combining two multi index dfs and column index

问题

I have two multi-index dataframes: mean and std

arrays = [['A', 'A', 'B', 'B'], ['Z', 'Y', 'X', 'W']]

mean=pd.DataFrame(data={0.0:[np.nan,2.0,3.0,4.0], 60.0: [5.0,np.nan,7.0,8.0], 120.0:[9.0,10.0,np.nan,12.0]}, 
         index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
mean.columns.name='Times'

std=pd.DataFrame(data={0.0:[10.0,10.0,10.0,10.0], 60.0: [10.0,10.0,10.0,10.0], 120.0:[10.0,10.0,10.0,10.0]}, 
         index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
std.columns.name='Times'

My task is to combine them in a dictionary with '{id:' as first level, followed by second level dictionary with '{comp:' and then for each comp a list of tuples, which combines the (time-points, mean, std). So, the result should look like that:

{'A': {
     'Z': [(60.0,5.0,10.0),
            (120.0,9.0,10.0)],
      'Y': [(0.0,2.0,10.0),
            (120.0,10.0,10.0)]
       },
  'B': {
     'X': [(0.0,3.0,10.0),
            (60.0,7.0,10.0)],
      'W': [(0.0,4.0,10.0),
            (60.0,8.0,10.0),
            (120.0,12.0,10.0)]
       }
 }

Additionally, when there is NaN in data, the triplets are left out, so value A,Z at time 0, A,Y at time 60 B,X at time 120.

How do I get there? I constructed already a dict of dict of list of tuples for a single line:

iter=0
{mean.index[iter][0]:{mean.index[iter][1]:list(zip(mean.columns, mean.iloc[iter], std.iloc[iter]))}}
>{'A': {'Z': [(0.0, 1.0, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]}}

Now, I need to extend to a dictionary with a loop over each line {inner dict) and adding the ids each {outer dict}. I started with iterrows and dic comprehension, but here I have problems, indexing with the iter ('A','Z') which i get from iterrows(), and building the whole dict, iteratively.

{mean.index[iter[1]]:list(zip(mean.columns, mean.loc[iter[1]], std.loc[iter[1]])) for (iter,row) in mean.iterrows()}

creates errors, and I would only have the inner loop

KeyError: 'the label [Z] is not in the [index]'

Thanks!

EDIT: I exchanged the numbers to float in this example, because here integers were generated before which was not consistent with my real data, and which would fail in following json dump.

回答1:

Here is a solution using a defaultdict:

from collections import defaultdict

mean_as_dict = mean.to_dict(orient='index')
std_as_dict = std.to_dict(orient='index')

mean_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in mean_as_dict.items()}
std_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in std_as_dict.items()}

sol = {k: [j + (std_clean_sorted[k][i][1],) for i, j in enumerate(v) if not np.isnan(j[1])] for k, v in mean_clean_sorted.items()}

solution = defaultdict(dict)

for k, v in sol.items():
    solution[k[0]][k[1]] = v

Resulting dict will be defaultdict object that you can change to dict easily:

solution = dict(solution)

回答2:

con = pd.concat([mean, std])
primary = dict()
for i in set(con.index.values):
    if i[0] not in primary.keys():
        primary[i[0]] = dict()
    primary[i[0]][i[1]] = list()
    for x in con.columns:
        primary[i[0]][i[1]].append((x, tuple(con.loc[i[0]].loc[i[1][0].values)))

Here is sample output

回答3:

I found a very comprehensive way of putting up this nested dict:

mean_dict_items=mean.to_dict(orient='index').items()
{k[0]:{u[1]:list(zip(mean.columns, mean.loc[u], std.loc[u]))
      for u,v in mean_dict_items if (k[0],u[1]) == u} for k,l in mean_dict_items}

creates:

{'A': {'Y': [(0.0, 2.0, 10.0), (60.0, nan, 10.0), (120.0, 10.0, 10.0)],
  'Z': [(0.0, nan, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]},
 'B': {'W': [(0.0, 4.0, 10.0), (60.0, 8.0, 10.0), (120.0, 12.0, 10.0)],
  'X': [(0.0, 3.0, 10.0), (60.0, 7.0, 10.0), (120.0, nan, 10.0)]}}

来源：https://stackoverflow.com/questions/48864096/build-dict-from-list-of-tuples-combining-two-multi-index-dfs-and-column-index

标签

python

list

pandas

dictionary

tuples