Is it possible to use Pandas Overlap in a Dataframe?

谁说胖子不能爱 提交于 2021-01-28 07:38:07

问题


Python 3.7, Pandas 25

I have a Pandas Dataframe with columns for startdate and enddate. I am looking for ranges that overlap the range of my variable(s). Without being verbose and composing a series of greater than/less than statements with ands/ors to filter out the rows I need, I would like to use some sort of interval "overlap". It appears Pandas has this functionality:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Interval.overlaps.html

The following test works:

range1 = pd.Interval(pd.Timestamp('2017-01-01 00:00:00'),pd.Timestamp('2018-01-01 00:00:00'),closed='both')
range2 = pd.Interval(pd.Timestamp('2016-01-01 00:00:00'),pd.Timestamp('2017-01-01 00:00:00'),closed='both')
range1.overlaps(range2)

However, when I go to apply it to the dataframe columns it does not. I am not sure if there is something wrong in my syntax, or if this simply can not be applied to a dataframe. Here are some of the things I have tried (and received the gamut of errors):

start_range = '2017-07-01 00:00:00'
end_current = '2019-07-01 00:00:00'
reporttest_range = pd.Interval(pd.Timestamp(start_range),pd.Timestamp(end_current),closed='both')
reporttest_filter = my_dataframe[my_dataframe['startdate']['enddate'].overlaps(reporttest_range)]
reporttest_filter = my_dataframe[my_dataframe['startdate','enddate'].overlaps(reporttest_range)]
reporttest_filter = my_dataframe[(my_dataframe['startdate','enddate']).overlaps(reporttest_range)]
reporttest_filter = my_dataframe.filter(['startdate','enddate']).overlaps(reporttest_range)
reporttest_filter = my_dataframe.filter['startdate','enddate'].overlaps(reporttest_range)
reporttest_filter = my_dataframe.filter(['startdate','enddate']).overlaps(reporttest_range)
print(reporttest_filter)

Can someone please point me to an efficient way to accomplish this?

As requested, the dataframe output looks like this:

      record    startdate    enddate
0         99    2017-07-01 2018-06-30
1        280    2018-08-01 2021-07-31
2        100    2017-07-01 2018-06-30
3        281    2017-07-01 2018-06-30

回答1:


You need to create IntervalIndex from df.startdate and df.enddate and use overlaps against reporttest_range. Your sample returns all true, so I add row for False case.

Sample df:   

   record  startdate    enddate
0    9931 2017-07-01 2018-06-30
1   28075 2018-08-01 2021-07-31
2   10042 2017-07-01 2018-06-30
3   28108 2017-07-01 2018-06-30
4   28109 2016-07-01 2016-12-30
5   28111 2017-07-02 2018-09-30

iix = pd.IntervalIndex.from_arrays(df.startdate, df.enddate, closed='both')
iix.overlaps(reporttest_range)

Out[400]: array([ True,  True,  True,  True, False,  True])

Use it to pick only overlapping rows

df[iix.overlaps(reporttest_range)]

Out[401]:
   record  startdate    enddate
0    9931 2017-07-01 2018-06-30
1   28075 2018-08-01 2021-07-31
2   10042 2017-07-01 2018-06-30
3   28108 2017-07-01 2018-06-30
5   28111 2017-07-02 2018-09-30


来源:https://stackoverflow.com/questions/58192068/is-it-possible-to-use-pandas-overlap-in-a-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!