Pandas find duration between dates where a condition is met?

孤者浪人 提交于 2019-12-02 02:43:40

问题


I have a pandas DataFrame that looks like this:

╔═══╦════════════╦═════════════╗
║   ║ VENDOR ID  ║ DATE        ║
╠═══╬════════════╬═════════════╣
║ 1 ║ 33         ║ 01/12/2018  ║
║ 2 ║ 33         ║ 03/12/2018  ║
║ 3 ║ 12         ║ 01/08/2018  ║
║ 4 ║ 12         ║ 01/15/2018  ║
║ 5 ║ 12         ║ 01/23/2018  ║
║ 6 ║ 33         ║ 05/12/2018  ║
║ 7 ║ 89         ║ 01/12/2018  ║
╚═══╩════════════╩═════════════╝

And I'm hoping to get a table that gives me the number of days since the same VENDOR ID last occured, like so:

╔═══╦════════════╦═════════════╗
║   ║ VENDOR ID  ║     GAP     ║
╠═══╬════════════╬═════════════╣
║ 1 ║ 33         ║ ----------  ║
║ 2 ║ 33         ║     60      ║
║ 3 ║ 12         ║ ----------  ║
║ 4 ║ 12         ║      7      ║
║ 5 ║ 12         ║      8      ║
║ 6 ║ 33         ║     60      ║
║ 7 ║ 89         ║ ----------  ║
╚═══╩════════════╩═════════════╝

I've been trying to find a way to achieve this using groupbys and other tricks but can't seem to get anything to work.

I did come up with what I think might work using 2 nested for loops or iterrrows in pandas but because of the size of my dataset using nested loops won't really work.

Any one have any ideas?


回答1:


I get a bit different output:

df['DATE'] = pd.to_datetime(df['DATE'])
df['GAP'] = df.groupby('VENDOR ID')['DATE'].diff().dt.days
print (df)
   VENDOR ID       DATE   GAP
1         33 2018-01-12   NaN
2         33 2018-03-12  59.0
3         12 2018-01-08   NaN
4         12 2018-01-15   7.0
5         12 2018-01-23   8.0
6         33 2018-05-12  61.0
7         89 2018-01-12   NaN

Explanation:

  1. Convert column to_datetime
  2. Then groupby with diff
  3. Last convert timedeltass to days


来源:https://stackoverflow.com/questions/51327498/pandas-find-duration-between-dates-where-a-condition-is-met

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!