问题
Piggy backing off my own previous question python pandas: assign control vs. treatment groupings randomly based on %
Thanks to @maxU, I know how to assign random control/treatment groupings to 2 groups; but what if I have 3 groups or more?
For example:
df.head()
customer_id | Group | many other columns
ABC 1
CDE 3
BHF 2
NID 1
WKL 3
SDI 2
JSK 1
OSM 3
MPA 2
MAD 1
pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique()))
Group 1 : 270
Group 2 : 180
Group 3 : 330
I have a great answer, when I only have two groups:
df['Flag'] = df.groupby('Group')['customer_id']\
.transform(lambda x: np.random.choice(['Control','Test'], len(x),
p=[.5,.5] if x.name==1 else [.4,.6]))
But what if i want to split it this way:
- Group 1: 50% Control & 50% Test
- Group 2: 40% Control & 60% Test
- Group 3: 20% Control & 80% Test
@MaxU's answer is great, but unfortunately the split is not exact
d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}
df['Flag'] = df.groupby('Group')['customer_id'] \
.transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
When i test it, I don't get exact splits.
pd.pivot_table(df,index=['Group'],values=["customer_id"],columns=['Flag'], aggfunc=lambda x: len(x.unique()))
Control Treatment
Group 1: 138 132
Group 2: 78 102
Group 3: 79 251
Group 1 should be 135/135.
回答1:
It sounds like you're looking for a way to split your customer_id's into exact proportions, and not rely on chance. Here's one way to do that using pandas.qcut and np.random.permutation.
In [228]: df = pd.DataFrame({'customer_id': np.random.normal(size=10000),
'group': np.random.choice(['a', 'b', 'c'], size=10000)})
In [229]: proportions = {'a':[.5,.5], 'b':[.4,.6], 'c':[.2,.8]}
In [230]: df.head()
Out[230]:
customer_id group
0 0.6547 c
1 1.4190 a
2 0.4205 a
3 2.3266 a
4 -0.5691 b
In [231]: def assigner(gp):
...: group = gp['group'].iloc[0]
...: cut = pd.qcut(
np.arange(gp.shape[0]),
q=np.cumsum([0] + proportions[group]),
labels=range(len(proportions[group]))
).get_values()
...: return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='assignment')
...:
In [232]: df['assignment'] = df.groupby('group', group_keys=False).apply(assigner)
In [233]: df.head()
Out[233]:
customer_id group assignment
0 0.6547 c 1
1 1.4190 a 1
2 0.4205 a 0
3 2.3266 a 1
4 -0.5691 b 0
In [234]: (df.groupby(['group', 'assignment'])
.size()
.unstack()
.assign(proportion=lambda x: x[0] / (x[0] + x[1])))
Out[234]:
assignment 0 1 proportion
group
a 1659 1658 0.5002
b 1335 2003 0.3999
c 669 2676 0.2000
What's going on here?
- Within each group we call the function
assigner assignergrabs the group name and proportions from the predefined dictionary and callspd.qcutto split into 0(control) 1(treatment)np.random.permutationthen shuffles the the assignments- Create this as a new column in the original dataframe
回答2:
In [13]: df
Out[13]:
customer_id Group
0 ABC 1
1 CDE 3
2 BHF 2
3 NID 1
4 WKL 3
5 SDI 2
6 JSK 1
7 OSM 3
8 MPA 2
9 MAD 1
In [14]: d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}
In [15]: df['Flag'] = \
...: df.groupby('Group')['customer_id'] \
...: .transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
...:
In [16]: df
Out[16]:
customer_id Group Flag
0 ABC 1 Control
1 CDE 3 Test
2 BHF 2 Test
3 NID 1 Control
4 WKL 3 Control
5 SDI 2 Test
6 JSK 1 Test
7 OSM 3 Test
8 MPA 2 Control
9 MAD 1 Test
来源:https://stackoverflow.com/questions/46552395/assign-control-vs-treatment-groupings-randomly-based-on-for-more-than-2-group