Say that I have a dataframe that looks like:
Name Group_Id
AAA 1
ABC 1
CCC 2
XYZ 2
DEF 3
YYH 3
How could I randomly select one (or m
There are two ways to do this very simply, one without using anything except basic pandas syntax:
df[['x','y']].groupby('x').agg(pd.DataFrame.sample)
This takes 14.4ms with 50k row dataset.
The other, slightly faster method, involves numpy.
df[['x','y']].groupby('x').agg(np.random.choice)
This takes 10.9ms with (the same) 50k row dataset.
Generally speaking, when using pandas, it's preferable to stick with its native syntax. Especially for beginners.