In a Pandas dataframe, how can I extract the difference between the values on separate rows within the same column, conditional on a second column? [duplicate]

和自甴很熟 提交于 2020-01-05 05:37:10

问题


This is part of a larger project, but I've broken my problem down into steps, so here's the first step. Take a Pandas dataframe, like this:

index | user   time     
---------------------
 0      F       0   
 1      T       0   
 2      T       0   
 3      T       1   
 4      B       1 
 5      K       2 
 6      J       2 
 7      T       3 
 8      J       4 
 9      B       4 

For each unique user, can I extract the difference between the values in column "time," but with some conditions?

So, for example, there are two instances of user J, and the "time" difference between these two instances is 2. Can I extract the difference, 2, between these two rows? Then if that user appears again, extract the difference between that row and the previous appearance of that user in the dataframe?


回答1:


I believe need DataFrameGroupBy.diff:

df['new'] = df.groupby('user')['time'].diff()
print (df)
  user  time  new
0    F     0  NaN
1    T     0  NaN
2    T     0  0.0
3    T     1  1.0
4    B     1  NaN
5    K     2  NaN
6    J     2  NaN
7    T     3  2.0
8    J     4  2.0
9    B     4  3.0



回答2:


I think np.where and pandas shifts does this This subtract between two consecutive Time, only if the users are same

df1 = np.where (df['users'] == df['users'].shifts(-1), df['time'] - df['time'].shifts(-1), 'NaN')


来源:https://stackoverflow.com/questions/50637942/in-a-pandas-dataframe-how-can-i-extract-the-difference-between-the-values-on-se

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!