问题
I would like to group by list first by assigning group 1, if the closest negative diff to 0 is Location 86 as Group 1, and I would like to assign Group 2 if the closest negative diff to 0 is Location 90. And then group 3 would be if Location 86 and 90 are the closest. After this set is run, I would rerun the code and anywhere a Group has not been assigned, it begins assigning starting from Group 4 and on, so as to not override the previous group assignments.
The groupby is occurring based on ID, Location, and closest to the Anchor column.
Note in the below example, we skip over Location 66 as an exception, where I would use df['diff'].where(df['diff'].le(0)&df['Anchor Date'].ne('Y')&df['Location'].ne(66))
Input:
ID Location Anchor Date Diff
111 86 N 5/2/2020 -1
111 87 Y 5/3/2020 0
111 90 N 5/4/2020 -2
111 90 Y 5/6/2020 0
123 86 N 1/4/2020 -1
123 90 N 1/4/2020 -1
123 91 Y 1/5/2020 0
456 64 N 2/3/2020 -2
456 66 N 2/4/2020 -1
456 91 Y 2/5/2020 0
Output:
ID Location Anchor Date Diff Group
111 86 N 5/2/2020 -1 1
111 87 Y 5/3/2020 0
111 90 N 5/4/2020 -2 2
111 90 Y 5/6/2020 0
123 86 N 1/4/2020 -1 3
123 90 N 1/4/2020 -1 3
123 91 Y 1/5/2020 0
456 64 N 2/3/2020 -2 4
456 66 N 2/4/2020 -1
456 91 Y 2/5/2020 0
回答1:
Among your exception rules, the one with both 86 and 90 adds some complexity to the code as one need to get a value for this group composed of two locations. In general the fact that you want to catch several location if same diff is harder. Here is one way. Create series with different groups values and masks
#catch each group per ID and up until a 0
gr = (df['ID'].ne(df['ID']).shift()|df['Anchor'].shift().eq('Y')).cumsum()
# where the diff per group is equal to the last value possible before anchor
mask_last = (df['Diff'].where(df['Diff'].le(0)&df['Anchor'].ne('Y')&df['Location'].ne(66))
.groupby(gr).transform('last')
.eq(df['Diff']))
# need this info to create unique fake Location value, especially if several
loc_max = df['Location'].max()+1
#create groups based on Location value
gr2 = (df['Location'].where(mask_last).groupby(gr)
.transform(lambda x:(x.dropna().sort_values()
*loc_max**np.arange(len(x.dropna()))).sum()))
Now you can create the groups:
#now create the column group
d_exception = {86:1, 90:2, 86 + 90*loc_max:3} #you can add more
df['group'] = ''
#exception
for key, val in d_exception.items():
df.loc[mask_last&gr2.eq(key), 'group'] = val
#the rest of the groups
idx = df.index[mask_last&~gr2.isin(d_exception.keys())]
df.loc[idx, 'group'] = pd.factorize(df.loc[idx, 'Location'])[0]+len(d_exception)+1
print (df)
ID Location Anchor Date Diff group
0 111 86 N 5/2/2020 -1 1
1 111 87 Y 5/3/2020 0
2 111 90 N 5/4/2020 -2 2
3 111 90 Y 5/6/2020 0
4 123 86 N 1/4/2020 -1 3
5 123 90 N 1/4/2020 -1 3
6 123 91 Y 1/5/2020 0
7 456 64 N 2/3/2020 -2 4
8 456 66 N 2/4/2020 -1
9 456 91 Y 2/5/2020 0
来源:https://stackoverflow.com/questions/63142055/python-grouping-and-assigning-exception-rules