问题
This is part of a larger project, but I've broken my problem down into steps, so here's the first step. Take a Pandas dataframe, like this:
index | user time
---------------------
0 F 0
1 T 0
2 T 0
3 T 1
4 B 1
5 K 2
6 J 2
7 T 3
8 J 4
9 B 4
For each unique user, can I extract the difference between the values in column "time," but with some conditions?
So, for example, there are two instances of user J, and the "time" difference between these two instances is 2. Can I extract the difference, 2, between these two rows? Then if that user appears again, extract the difference between that row and the previous appearance of that user in the dataframe?
回答1:
I believe need DataFrameGroupBy.diff:
df['new'] = df.groupby('user')['time'].diff()
print (df)
user time new
0 F 0 NaN
1 T 0 NaN
2 T 0 0.0
3 T 1 1.0
4 B 1 NaN
5 K 2 NaN
6 J 2 NaN
7 T 3 2.0
8 J 4 2.0
9 B 4 3.0
回答2:
I think np.where
and pandas shifts
does this
This subtract between two consecutive Time, only if the users are same
df1 = np.where (df['users'] == df['users'].shifts(-1), df['time'] - df['time'].shifts(-1), 'NaN')
来源:https://stackoverflow.com/questions/50637942/in-a-pandas-dataframe-how-can-i-extract-the-difference-between-the-values-on-se