Applying a custom groupby aggregate function to find average of Numpy Array

前端 未结 2 1266
面向向阳花
面向向阳花 2020-12-12 03:46

I am having a pandas DataFrame where B contains NumPy list of fixed size.

|------|---------------|-------|
|  A   |       B       |   C   |
|------|--------         


        
相关标签:
2条回答
  • 2020-12-12 04:18

    What you need is possible with convert values to 2d array and then using np.mean:

    f = lambda x: np.mean(np.array(x.tolist()), axis=0)
    df2 = df.groupby('C')['B'].apply(f).reset_index()
    print (df2)
       C                     B
    0  X  [1.5, 2.5, 4.0, 5.0]
    1  Y  [2.0, 3.0, 4.0, 4.0]
    2  Z  [2.0, 3.0, 5.0, 6.0]
    

    Last option solution is possible, but less effient (thank you @Abhik Sarkar for test):

    df1 = pd.DataFrame(df.B.tolist()).groupby(df['C']).mean()
    df2 = pd.DataFrame({'B': df1.values.tolist(), 'C': df1.index})
    print (df2)
                          B  C
    0  [1.5, 2.5, 4.0, 5.0]  X
    1  [2.0, 3.0, 4.0, 4.0]  Y
    2  [2.0, 3.0, 5.0, 6.0]  Z
    
    0 讨论(0)
  • 2020-12-12 04:43

    Dummy data

    size,list_size = 10,5
    data = [{'C':random.randint(95,100), 
             'B':[random.randint(0,10) for i in range(list_size)]} for j in range(size)]
    df = pd.DataFrame(data)
    

    Custom Aggregation Using numpy

    unique_C = df.C.unique()
    data_calculated  = []
    axis = 0
    
    for c in unique_C:
        arr = np.reshape(np.hstack(df[df.C==c]['B']),(-1,list_size))
        mean, std = arr.mean(axis=axis), arr.std(axis=axis)  # other aggergation can also be added
        data_calculated.append(dict(C=t,B_mean=mean, B_std=std))
    new_df = pd.DataFrame(data_calculated)
    
    0 讨论(0)
提交回复
热议问题