Sample with different sample sizes per customer

人走茶凉 提交于 2021-02-08 07:20:59

问题


I have a data frame as such

    Customer   Day
0.    A         1
1.    A         1
2.    A         1
3.    A         2
4.    B         3
5.    B         4

and I want to sample from it but I want to sample different sizes for each customer. I have the size of each customer in another dataframe. For example,

    Customer   Day
0.    A         2
1.    B         1

Suppose I want to sample per customer per day. So far I have this function:

def sampling(frame,a): 
    return np.random.choice(frame.Id,size=a) 

grouped = frame.groupby(['Customer','Day'])
sampled = grouped.apply(sampling, a=??).reset_index()

If I set the size parameter to a global constant, no problem it runs. But I don't know how to set this when the different values are on a separate dataframe.


回答1:


You can create a mapper from the df1 with sample size and use that value as sample size,

mapper = df1.set_index('Customer')['Day'].to_dict()

df.groupby('Customer', as_index=False).apply(lambda x: x.sample(n = mapper[x.name]))


       Customer Day
0   3   A       2
    2   A       1
1   4   B       3

This returns multi-index, you can always reset_index,

df.groupby('Customer').apply(lambda x: x.sample(n = mapper[x.name])).reset_index(drop = True)

    Customer    Day
0   A           1
1   A           1
2   B           3


来源:https://stackoverflow.com/questions/58794340/sample-with-different-sample-sizes-per-customer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!