问题
i have a dataframe that looks like this :
Words Start_time(in sec) End_time(in secs) Total_Time_words
0 let 0.1 2.5 2.6
1 me 2.5 2.6 5.1
2 tell 2.6 2.9 5.5
3 you 2.9 3.0 5.9
4 about 3.0 3.2 6.2
I want to select only "Start_time(in sec)" and "End_time(in secs)" column so per index so that i can use it to look for time frame across the second dataframe so that i can now slect the top 5 amplitude of each and take the mean. for example index 0 : between 0.1 and 2.5 on this dataframe below and do so for others on from the above dataframe :
Time Amplitudes
1220673 5.36 0.000155
1220674 1.36 0.000936
1220675 0.18 0.001319
1220676 2.36 0.001513
1220677 0.45 0.001666
1220678 1.06 0.001476
1220679 0.17 0.000820
1220680 55.36 0.000409
1220681 55.36 0.000227
1220682 0.09 0.000847
1220683 0.46 0.001333
1220684 1.26 0.001595
1220685 0.30 0.001481
1220686 55.36 0.001312
1220687 55.36 0.002050
Expected output should be a new dataframe of the first one with the result of the above :
Words Start_time(in sec) End_time(in secs) Total_Time_words Amplitude
0 let 0.1 2.5 2.6 0.23
1 me 2.5 2.6 5.1 0.12
2 tell 2.6 2.9 5.5 0.09
3 you 2.9 3.0 5.9 1.20
4 about 3.0 3.2 6.2 0.67
回答1:
You can use pd.cut and groupby():
bins = [df['Start_time(in sec)'].iloc[0]] + list(df['End_time(in secs)'])
s = pd.cut(df2.Time, bins=bins, labels=df.index)
df['Amplitudes'] = (df2.sort_values('Amplitudes', ascending=False)
.groupby(s)['Amplitudes']
.apply(lambda x: x.head(5).mean())
)
Output:
Words Start_time(in sec) End_time(in secs) Total_Time_words Amplitudes
0 let 0.1 2.5 2.6 0.001546
1 me 2.5 2.6 5.1 NaN
2 tell 2.6 2.9 5.5 NaN
3 you 2.9 3.0 5.9 NaN
4 about 3.0 3.2 6.2 NaN
来源:https://stackoverflow.com/questions/62350781/how-do-i-compare-two-dataframe-using-between-function-on-the-other-dataframe