Pandas DataFrame.groupby() to dictionary with multiple columns for value

后端 未结 3 1426
执笔经年
执笔经年 2020-12-31 12:25
type(Table)
pandas.core.frame.DataFrame

Table
======= ======= =======
Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19               


        
相关标签:
3条回答
  • 2020-12-31 13:03

    Customize the function you use in apply so it returns a list of lists for each group:

    df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()
    # {0: [[23, 1]], 
    #  1: [[5, 2], [2, 3], [19, 5]], 
    #  2: [[56, 1], [22, 2]], 
    #  3: [[2, 4], [14, 5]], 
    #  4: [[59, 1]], 
    #  5: [[44, 1], [1, 2], [87, 3]]}
    

    If you need a list of tuples explicitly, use list(map(tuple, ...)) to convert:

    df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
    # {0: [(23, 1)], 
    #  1: [(5, 2), (2, 3), (19, 5)], 
    #  2: [(56, 1), (22, 2)], 
    #  3: [(2, 4), (14, 5)], 
    #  4: [(59, 1)], 
    #  5: [(44, 1), (1, 2), (87, 3)]}
    
    0 讨论(0)
  • 2020-12-31 13:07

    I'd rather use defaultdict

    from collections import defaultdict
    
    d = defaultdict(list)
    
    for row in df.values.tolist():
        d[row[0]].append(tuple(row[1:]))
    
    dict(d)
    
    {0: [(23, 1)],
     1: [(5, 2), (2, 3), (19, 5)],
     2: [(56, 1), (22, 2)],
     3: [(2, 4), (14, 5)],
     4: [(59, 1)],
     5: [(44, 1), (1, 2), (87, 3)]}
    
    0 讨论(0)
  • 2020-12-31 13:13

    One way is to create a new tup column and then create the dictionary.

    df['tup'] = list(zip(df['Column2'], df['Column3']))
    df.groupby('Column1')['tup'].apply(list).to_dict()
    
    # {0: [(23, 1)],
    #  1: [(5, 2), (2, 3), (19, 5)],
    #  2: [(56, 1), (22, 2)],
    #  3: [(2, 4), (14, 5)],
    #  4: [(59, 1)],
    #  5: [(44, 1), (1, 2), (87, 3)]}
    

    @Psidom's solution is more efficient, but if performance isn't an issue use what makes more sense to you:

    df = pd.concat([df]*10000)
    
    def jp(df):
        df['tup'] = list(zip(df['Column2'], df['Column3']))
        return df.groupby('Column1')['tup'].apply(list).to_dict()
    
    def psi(df):
        return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
    
    %timeit jp(df)   # 110ms
    %timeit psi(df)  # 80ms
    
    0 讨论(0)
提交回复
热议问题