Convert Interval Outer Join SQL in Python Pandas Dataframe

旧巷老猫 提交于 2019-12-11 04:58:50

问题


I'm converting an Oracle SQL outer interval join in Pandas Dataframe. Below is the Oracle SQL:

WITH df_interval AS
          (SELECT '1' id,
                     'AAA' interval,
                     1000 begin,
                     2000 end
              FROM DUAL
            UNION ALL
            SELECT '1' id,
                     'BBB' intrvl,
                     2100 begin,
                     3000 end
              FROM DUAL
            UNION ALL
            SELECT '2' id,
                     'CCC' intrvl,
                     3100 begin,
                     4000 end
              FROM DUAL
            UNION ALL
            SELECT '2' id,
                     'DDD' intrvl,
                     4100 begin,
                     5000 end
              FROM DUAL),
      df_point AS
          (SELECT '1' id, 'X1' point, 1100 mid FROM DUAL
            UNION ALL
            SELECT '1' id, 'X2' point, 2050 mid FROM DUAL
            UNION ALL
            SELECT '1' id, 'X3' point, 3200 mid FROM DUAL
            UNION ALL
            SELECT '2' id, 'X4' point, 4200 mid FROM DUAL
            UNION ALL
            SELECT '2' id, 'X5' point, 5500 mid FROM DUAL)
SELECT pt.id,
         point,
         mid,
         interval
  FROM df_interval it RIGHT OUTER JOIN df_point pt ON pt.id = it.id AND pt.mid BETWEEN it.begin AND it.end

I tried to create dataframes, but I'm not able to join as 'RIGHT OUTER JOIN interval' as above Oracle SQL:

import pandas as pd
df_interval = pd.DataFrame({
                   'ID':['1','1','2','2'],
                   'interval': ['AAA', 'BBB', 'CCC', 'DDD'],
                   'begin': [1000,2100,3100,4100],
                   'end': [2000, 3000,4000,5000]})

df_point = pd.DataFrame({
                   'ID':['1','1','1','2','2'],
                   'point': ['X1', 'X2', 'X3', 'X4','X5'],
                   'mid': [1100,2050,3200,4200,5500]})

I expect the output would be something like this:

df_out = pd.DataFrame({
                   'ID':['1','1','1','2','2'],
                   'mid': [1100,2050,3200,4200,5500],
                   'intrvl':['AAA','','','DDD','']})

Appreciate anyone can help me on this?


回答1:


I feel like merge_asof is perfect fine for you case, only different is we need do two times , when both the end and begin merge result is same , that interval should be the matched one

s1=pd.merge_asof(df_point,df_interval,by='ID',left_on='mid',right_on='end',direction='forward')
s2=pd.merge_asof(df_point,df_interval,by='ID',left_on='mid',right_on='begin',direction='backward')
s1.interval=s1.interval.where(s1.interval==s2.interval)
s1.drop(['end','begin'],1,inplace=True)
s1
  ID point   mid interval
0  1    X1  1100      AAA
1  1    X2  2050      NaN
2  1    X3  3200      NaN
3  2    X4  4200      DDD
4  2    X5  5500      NaN


来源:https://stackoverflow.com/questions/56614958/convert-interval-outer-join-sql-in-python-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!