Python pandas dataframe backfill based on two conditions

萝らか妹 提交于 2020-01-13 16:31:50

问题


I have a dataframe like this:

   Bool   Hour
0  False  12
1  False  24
2  False  12
3  False  24
4  True   12
5  False  24
6  False  12
7  False  24
8  False  12
9  False  24
10 False  12
11 True   24

and I would like to backfill the True value in 'Bool' column to the point when 'Hour' first reaches '12'. The result would be something like this:

   Bool   Hour  Result
0  False  12    False
1  False  24    False
2  False  12    True      <- desired backfill
3  False  24    True      <- desired backfill
4  True   12    True
5  False  24    False
6  False  12    False
7  False  24    False
8  False  12    False
9  False  24    False
10 False  12    True      <- desired backfill
11 True   24    True

Any help is greatly appreciated! Thank you very much!


回答1:


This is a little bit hard to achieve , here we can use groupby with idxmax

s=(~df.Bool&df.Hour.eq(12)).iloc[::-1].groupby(df.Bool.iloc[::-1].cumsum()).transform('idxmax')
df['result']=df.index>=s.iloc[::-1]
df
Out[375]: 
     Bool  Hour  result
0   False    12   False
1   False    24   False
2   False    12    True
3   False    24    True
4    True    12    True
5   False    24   False
6   False    12   False
7   False    24   False
8   False    12   False
9   False    24   False
10  False    12    True
11   True    24    True



回答2:


IIUC, you can do:

s = df['Bool'].shift(-1)
df['Result'] = df['Bool'] | s.where(s).groupby(df['Hour'].eq(12).cumsum()).bfill()

Output:

     Bool  Hour  Result
0   False    12   False
1   False    24   False
2   False    12    True
3   False    24    True
4    True    12    True
5   False    24   False
6   False    12   False
7   False    24   False
8   False    12   False
9   False    24   False
10  False    12    True
11   True    24    True



回答3:


create a groupID s on consecutive False and separate True from them. Groupby on Hour equals 12 by using s. Use transform sum and cumsum to get the count of True on 12 from bottom-up on each group and return True on 0 and or with values of Bool

s = df.Bool.ne(df.Bool.shift()).cumsum()
s1 = df.where(df.Bool).Bool.bfill()
g = df.Hour.eq(12).groupby(s)
df['bfill_Bool'] = (g.transform('sum') - g.cumsum()).eq(0) & s1 | df.Bool

Out[905]:
     Bool  Hour  bfill_Bool
0   False    12       False
1   False    24       False
2   False    12        True
3   False    24        True
4    True    12        True
5   False    24       False
6   False    12       False
7   False    24       False
8   False    12       False
9   False    24       False
10  False    12        True
11   True    24        True


来源:https://stackoverflow.com/questions/58104114/python-pandas-dataframe-backfill-based-on-two-conditions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!