Pandas aggregate count distinct

前端 未结 3 925
再見小時候
再見小時候 2020-12-04 07:34

Let\'s say I have a log of user activity and I want to generate a report of total duration and the number of unique users per day.

import numpy as np
import          


        
3条回答
  •  醉话见心
    2020-12-04 08:12

    Just adding to the answers already given, the solution using the string "nunique" seems much faster, tested here on ~21M rows dataframe, then grouped to ~2M

    %time _=g.agg({"id": lambda x: x.nunique()})
    CPU times: user 3min 3s, sys: 2.94 s, total: 3min 6s
    Wall time: 3min 20s
    
    %time _=g.agg({"id": pd.Series.nunique})
    CPU times: user 3min 2s, sys: 2.44 s, total: 3min 4s
    Wall time: 3min 18s
    
    %time _=g.agg({"id": "nunique"})
    CPU times: user 14 s, sys: 4.76 s, total: 18.8 s
    Wall time: 24.4 s
    

提交回复
热议问题