Merging two pandas dataframes by interval

[亡魂溺海] 提交于 2020-12-05 20:20:26

问题


I have two pandas dataframes with following format:

df_ts = pd.DataFrame([
        [10, 20, 1,  'id1'],
        [11, 22, 5,  'id1'],
        [20, 54, 5,  'id2'],
        [22, 53, 7,  'id2'],
        [15, 24, 8,  'id1'],
        [16, 25, 10, 'id1']
    ], columns = ['x', 'y', 'ts', 'id'])


df_statechange = pd.DataFrame([
        ['id1', 2, 'ok'],
        ['id2', 4, 'not ok'],
        ['id1', 9, 'not ok']
    ], columns = ['id', 'ts', 'state'])

I am trying to get it to the format, such as:

df_out = pd.DataFrame([
        [10, 20, 1,  'id1', None    ],
        [11, 22, 5,  'id1', 'ok'    ],
        [20, 54, 5,  'id2', 'not ok'],
        [22, 53, 7,  'id2', 'not ok'],
        [15, 24, 8,  'id1', 'ok'    ],
        [16, 25, 10, 'id1', 'not ok']
    ], columns = ['x', 'y', 'ts', 'id', 'state'])

I understand how to accomplish it iteratively by grouping by id and then iterating through each row and changing status when it appears. Is there a pandas build-in more scalable way of doing this?


回答1:


Unfortunately pandas merge support only equality joins. See more details at the following thread: merge pandas dataframes where one value is between two others if you want to merge by interval you'll need to overcome the issue, for example by adding another filter after the merge:

joined = a.merge(b,on='id')
joined = joined[joined.ts.between(joined.ts1,joined.ts2)]



回答2:


You can merge pandas data frames on two columns:

pd.merge(df_ts,df_statechange, how='left',on=['id','ts'])

in df_statechange that you shared here there is no common values on ts in both dataframes. Apparently you just copied not complete data frame here. So i got this output:

    x   y  ts   id state
0  10  20   1  id1   NaN
1  11  22   5  id1   NaN
2  20  54   5  id2   NaN
3  22  53   7  id2   NaN
4  15  24   8  id1   NaN
5  16  25  10  id1   NaN

But indeed if you have common ts in the data frames it will have your desired output. For example:

df_statechange = pd.DataFrame([
        ['id1', 5, 'ok'],
        ['id1', 8, 'ok'],
        ['id2', 5, 'not ok'],
        ['id2',7, 'not ok'],
        ['id1', 9, 'not ok']
    ], columns = ['id', 'ts', 'state'])

the output:

  x   y  ts   id   state
0  10  20   1  id1     NaN
1  11  22   5  id1      ok
2  20  54   5  id2  not ok
3  22  53   7  id2  not ok
4  15  24   8  id1      ok
5  16  25  10  id1     NaN


来源:https://stackoverflow.com/questions/44106304/merging-two-pandas-dataframes-by-interval

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!