I am having a pandas DataFrame where B contains NumPy list of fixed size.
|------|---------------|-------|
| A | B | C |
|------|--------
What you need is possible with convert values to 2d array and then using np.mean
:
f = lambda x: np.mean(np.array(x.tolist()), axis=0)
df2 = df.groupby('C')['B'].apply(f).reset_index()
print (df2)
C B
0 X [1.5, 2.5, 4.0, 5.0]
1 Y [2.0, 3.0, 4.0, 4.0]
2 Z [2.0, 3.0, 5.0, 6.0]
Last option solution is possible, but less effient (thank you @Abhik Sarkar for test):
df1 = pd.DataFrame(df.B.tolist()).groupby(df['C']).mean()
df2 = pd.DataFrame({'B': df1.values.tolist(), 'C': df1.index})
print (df2)
B C
0 [1.5, 2.5, 4.0, 5.0] X
1 [2.0, 3.0, 4.0, 4.0] Y
2 [2.0, 3.0, 5.0, 6.0] Z
Dummy data
size,list_size = 10,5
data = [{'C':random.randint(95,100),
'B':[random.randint(0,10) for i in range(list_size)]} for j in range(size)]
df = pd.DataFrame(data)
Custom Aggregation Using numpy
unique_C = df.C.unique()
data_calculated = []
axis = 0
for c in unique_C:
arr = np.reshape(np.hstack(df[df.C==c]['B']),(-1,list_size))
mean, std = arr.mean(axis=axis), arr.std(axis=axis) # other aggergation can also be added
data_calculated.append(dict(C=t,B_mean=mean, B_std=std))
new_df = pd.DataFrame(data_calculated)