Python - Grouping and Assigning Exception Rules

孤人 提交于 2020-08-10 03:38:21

问题


I would like to group by list first by assigning group 1, if the closest negative diff to 0 is Location 86 as Group 1, and I would like to assign Group 2 if the closest negative diff to 0 is Location 90. And then group 3 would be if Location 86 and 90 are the closest. After this set is run, I would rerun the code and anywhere a Group has not been assigned, it begins assigning starting from Group 4 and on, so as to not override the previous group assignments.

The groupby is occurring based on ID, Location, and closest to the Anchor column.

Note in the below example, we skip over Location 66 as an exception, where I would use df['diff'].where(df['diff'].le(0)&df['Anchor Date'].ne('Y')&df['Location'].ne(66))

Input:

ID  Location Anchor Date       Diff
111 86       N      5/2/2020  -1
111 87       Y      5/3/2020   0
111 90       N      5/4/2020  -2
111 90       Y      5/6/2020   0
123 86       N      1/4/2020  -1
123 90       N      1/4/2020  -1
123 91       Y      1/5/2020   0
456 64       N      2/3/2020  -2
456 66       N      2/4/2020  -1
456 91       Y      2/5/2020   0

Output:

ID  Location Anchor Date       Diff  Group
111 86       N      5/2/2020  -1     1
111 87       Y      5/3/2020   0
111 90       N      5/4/2020  -2     2
111 90       Y      5/6/2020   0
123 86       N      1/4/2020  -1     3
123 90       N      1/4/2020  -1     3
123 91       Y      1/5/2020   0     
456 64       N      2/3/2020  -2     4
456 66       N      2/4/2020  -1     
456 91       Y      2/5/2020   0

回答1:


Among your exception rules, the one with both 86 and 90 adds some complexity to the code as one need to get a value for this group composed of two locations. In general the fact that you want to catch several location if same diff is harder. Here is one way. Create series with different groups values and masks

#catch each group per ID and up until a 0
gr = (df['ID'].ne(df['ID']).shift()|df['Anchor'].shift().eq('Y')).cumsum()
# where the diff per group is equal to the last value possible before anchor
mask_last = (df['Diff'].where(df['Diff'].le(0)&df['Anchor'].ne('Y')&df['Location'].ne(66))
                       .groupby(gr).transform('last')
                       .eq(df['Diff']))
# need this info to create unique fake Location value, especially if several
loc_max = df['Location'].max()+1
#create groups based on Location value
gr2 = (df['Location'].where(mask_last).groupby(gr)
                     .transform(lambda x:(x.dropna().sort_values()
                                          *loc_max**np.arange(len(x.dropna()))).sum()))

Now you can create the groups:

#now create the column group
d_exception = {86:1, 90:2, 86 + 90*loc_max:3} #you can add more
df['group'] = ''
#exception
for key, val in d_exception.items():
    df.loc[mask_last&gr2.eq(key), 'group'] = val
#the rest of the groups
idx = df.index[mask_last&~gr2.isin(d_exception.keys())]
df.loc[idx, 'group'] = pd.factorize(df.loc[idx, 'Location'])[0]+len(d_exception)+1
print (df)
    ID  Location Anchor      Date  Diff group
0  111        86      N  5/2/2020    -1     1
1  111        87      Y  5/3/2020     0      
2  111        90      N  5/4/2020    -2     2
3  111        90      Y  5/6/2020     0      
4  123        86      N  1/4/2020    -1     3
5  123        90      N  1/4/2020    -1     3
6  123        91      Y  1/5/2020     0      
7  456        64      N  2/3/2020    -2     4
8  456        66      N  2/4/2020    -1      
9  456        91      Y  2/5/2020     0      


来源:https://stackoverflow.com/questions/63142055/python-grouping-and-assigning-exception-rules

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!