divide a column based on groupby or looping conditions in pandas

你说的曾经没有我的故事 提交于 2020-04-15 06:48:29

问题


I have a data frame as shown below

   B_ID   No_Show   Session  slot_num   Patient_count
    1     0.2       S1        1          1
    2     0.3       S1        2          1
    3     0.8       S1        3          1
    4     0.3       S1        3          2
    5     0.6       S1        4          1
    6     0.8       S1        5          1
    7     0.9       S1        5          2
    8     0.4       S1        5          3
    9     0.6       S1        5          4
    12    0.9       S2        1          1
    13    0.5       S2        1          2
    14    0.3       S2        2          1
    15    0.7       S2        3          1
    20    0.7       S2        4          1
    16    0.6       S2        5          1
    17    0.8       S2        5          2
    19    0.3       S2        5          3

where

No_Show = Probability of no show

Assume that

threshold probability = 0.2 Duration for each slot = 30 (minutes)

From the above I would like calculate below data frame

Step1

sort the dataframe based on Session, slot_number and Patient_count

df = df.sort_values(['Session', 'slot_num', 'Patient_count'], ascending=False)

step 2 Calculate the cut off by using below conditions

if patient_count = 1 Divide No_show by threshold probability if patient_count = 1

Example for B_ID = 3, Patient_count = 1, cut_off = 0.8/0.2 = 4

else if patient_count = 2 multiply previously 1 No_Show with current No_show and divide with threshold)

Example for B_ID = 4, Patient_count = 2, cut_off = (0.3*0.8)/0.2 = 1.2

else if patient_count = 3 multiply previously 2 No_Show with current No_show and divide with threshold

Example for B_ID = 8, Patient_count = 3, cut_off = (0.4*0.9*0.8)/0.2 = 1.44

And so on

The Expected Output:

      B_ID   No_Show   Session  slot_num   Patient_count  Cut_off
        1     0.2       S1        1          1             1
        2     0.3       S1        2          1             1.5
        3     0.8       S1        3          1             4
        4     0.3       S1        3          2             1.2
        5     0.6       S1        4          1             3
        6     0.8       S1        5          1             4
        7     0.9       S1        5          2             3.6
        8     0.4       S1        5          3             1.44
        9     0.6       S1        5          4             0.864
        12    0.9       S2        1          1             4.5
        13    0.5       S2        1          2             2.25
        14    0.3       S2        2          1             1.5
        15    0.7       S2        3          1             3.5
        20    0.7       S2        4          1             3.5
        16    0.6       S2        5          1             3
        17    0.8       S2        5          2             2.4
        19    0.3       S2        5          3             0.72

回答1:


Use GroupBy.cumprod and divide by probability by Series.div:

probability = 0.2
df['new'] = df.groupby(['Session','slot_num'])['No_Show'].cumprod().div(probability)
print (df)
    B_ID  No_Show Session  slot_num  Patient_count    new
0      1      0.2      S1         1              1  1.000
1      2      0.3      S1         2              1  1.500
2      3      0.8      S1         3              1  4.000
3      4      0.3      S1         3              2  1.200
4      5      0.6      S1         4              1  3.000
5      6      0.8      S1         5              1  4.000
6      7      0.9      S1         5              2  3.600
7      8      0.4      S1         5              3  1.440
8      9      0.6      S1         5              4  0.864
9     12      0.9      S2         1              1  4.500
10    13      0.5      S2         1              2  2.250
11    14      0.3      S2         2              1  1.500
12    15      0.7      S2         3              1  3.500
13    20      0.7      S2         4              1  3.500
14    16      0.6      S2         5              1  3.000
15    17      0.8      S2         5              2  2.400
16    19      0.3      S2         5              3  0.720


来源:https://stackoverflow.com/questions/61079735/divide-a-column-based-on-groupby-or-looping-conditions-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!