How to get minimum of each group for each day based on hour criteria

前端 未结 4 1499
北海茫月
北海茫月 2020-12-22 01:29

I have given two dataframes below for you to test

df = pd.DataFrame({
    \'subject_id\':[1,1,1,1,1,1,1,1,1,1,1],
    \'time_1\' :[\'2173-04-03 12:35:00\',\'         


        
4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-22 02:25

    df = pd.DataFrame({
     'subject_id':[1,1,1,1,1,1,1,1,1,1],
     'time_1' :['2173-04-03 12:35:00','2173-04-03 17:00:00','2173-04-03 20:00:00','2173-04-04 11:00:00','2173-04-04 11:30:00','2173-04-04 12:00:00','2173-04-04 16:00:00','2173-04-04 22:00:00','2173-04-05 04:00:00','2173-04-05 06:30:00'],
      'val' :[5,5,5,10,5,10,5,8,8,10]
     })
    
    # Separate Date and time
    df['time_1']=pd.to_datetime(df['time_1'])
    df['new_date'] = [d.date() for d in df['time_1']]
    df['new_time'] = [d.time() for d in df['time_1']]
    
    
    # find time diff in group with the first element to check > 1 hr
    df['shift_val'] = df['val'].shift()
    df1=df.assign(time_diff=df.groupby(['subject_id','new_date']).time_1.apply(lambda x: x - x.iloc[0]))
    
    # Verify if time diff > 1 and value is not changed
    df2=df1.loc[(df1['time_diff']/ np.timedelta64(1, 'h') >= 1) & (df1.val == df1.groupby('new_date').first().val[0])]
    df3=df1.loc[(df1['time_diff']/ np.timedelta64(1, 'h') <= 1) & (df1.val == df1.shift_val)]
    
    # Get the minimum within the group
    df4=df2.append(df3).groupby(['new_date'], sort=False).min()
    
    # drop unwanted columns
    df4.drop(['new_time','shift_val','time_diff'],axis=1, inplace=True)
    
    df4
    

    Output

              subject_id    time_1     val
    new_date            
    2173-04-03  1   2173-04-03 17:00:00 5
    2173-04-04  1   2173-04-04 16:00:00 5
    2173-04-05  1   2173-04-05 04:00:00 8
    

提交回复
热议问题