Python: Random selection per group

后端 未结 9 911
面向向阳花
面向向阳花 2020-12-01 05:08

Say that I have a dataframe that looks like:

Name Group_Id
AAA  1
ABC  1
CCC  2
XYZ  2
DEF  3 
YYH  3

How could I randomly select one (or m

9条回答
  •  天涯浪人
    2020-12-01 05:37

    There are two ways to do this very simply, one without using anything except basic pandas syntax:

    df[['x','y']].groupby('x').agg(pd.DataFrame.sample)
    

    This takes 14.4ms with 50k row dataset.

    The other, slightly faster method, involves numpy.

    df[['x','y']].groupby('x').agg(np.random.choice)
    

    This takes 10.9ms with (the same) 50k row dataset.

    Generally speaking, when using pandas, it's preferable to stick with its native syntax. Especially for beginners.

提交回复
热议问题