Faster way to transform group with mean value in Pandas

前端 未结 2 1066
刺人心
刺人心 2020-12-03 20:00

I have a Pandas dataframe where I am trying to replace the values in each group by the mean of the group. On my machine, the line df[\"signal\"].groupby(g).transform(n

相关标签:
2条回答
  • 2020-12-03 20:28

    Current method, using transform

    In [44]: grp = df["signal"].groupby(g)
    
    In [45]: result2 = df["signal"].groupby(g).transform(np.mean)
    
    In [47]: %timeit df["signal"].groupby(g).transform(np.mean)
    1 loops, best of 3: 535 ms per loop
    

    Using 'broadcasting' of the results

     In [43]: result = pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
    
    In [42]: %timeit pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
    10 loops, best of 3: 119 ms per loop
    
    In [46]: result.equals(result2)
    Out[46]: True
    

    I think you might need to set the index of the returned on the broadcast result (it happens to work here because its a default index

    result = pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
    result.index = df.index
    
    0 讨论(0)
  • 2020-12-03 20:38

    Inspired by Jeff's answer. This is the fastest method on my machine:

    pd.Series(np.repeat(grp.mean().values, grp.count().values))
    
    0 讨论(0)
提交回复
热议问题