Pandas Compare two dataframes and determine the matched values

拥有回忆 提交于 2019-12-06 14:59:28

you can do something similar to this:

In [40]: pd.merge(a.assign(x=a.ID.str.split().apply(sorted).str.join(' ')),
    ...:          b.assign(x=b.ID.str.split().apply(sorted).str.join(' ')),
    ...:          on=['x','Value'],
    ...:          how='outer',
    ...:          indicator=True)
    ...:
Out[40]:
                           ID_x  Value                             x  \
0      AA12 101 BB101 CC01 DE06      1      101 AA12 BB101 CC01 DE06
1  AA11 102 BB101 CC01 234 EE07      2  102 234 AA11 BB101 CC01 EE07
2  AA10 202 BB101 CC01 345 EE09      3  202 345 AA10 BB101 CC01 EE09
3       AA13 103 BB101 CC02 123      4       103 123 AA13 BB101 CC02
4       AA14 203 BB101 CC02 456      5       203 456 AA14 BB101 CC02
5       AA15 204 BB102 CC03 567      6       204 567 AA15 BB102 CC03
6                           NaN      5       203 456 AA18 BB103 CC01
7                           NaN      7       204 678 AA15 BB201 CC11

                           ID_y      _merge
0      AA12 101 BB101 CC01 DE06        both
1  AA11 102 BB101 CC01 EE07 234        both
2  AA10 202 BB101 CC01 EE09 345        both
3       AA13 103 BB101 CC02 123        both
4                           NaN   left_only
5                           NaN   left_only
6       AA18 203 BB103 CC01 456  right_only
7       AA15 204 BB201 CC11 678  right_only

Explanation:

In [43]: a.ID.str.split()
Out[43]:
0         [AA12, 101, BB101, CC01, DE06]
1    [AA11, 102, BB101, CC01, 234, EE07]
2    [AA10, 202, BB101, CC01, 345, EE09]
3          [AA13, 103, BB101, CC02, 123]
4          [AA14, 203, BB101, CC02, 456]
5          [AA15, 204, BB102, CC03, 567]
Name: ID, dtype: object

In [44]: a.ID.str.split().apply(sorted)
Out[44]:
0         [101, AA12, BB101, CC01, DE06]
1    [102, 234, AA11, BB101, CC01, EE07]
2    [202, 345, AA10, BB101, CC01, EE09]
3          [103, 123, AA13, BB101, CC02]
4          [203, 456, AA14, BB101, CC02]
5          [204, 567, AA15, BB102, CC03]
Name: ID, dtype: object

In [45]: a.assign(x=a.ID.str.split().apply(sorted).str.join(' '))
Out[45]:
                             ID  Value                             x
0      AA12 101 BB101 CC01 DE06      1      101 AA12 BB101 CC01 DE06
1  AA11 102 BB101 CC01 234 EE07      2  102 234 AA11 BB101 CC01 EE07
2  AA10 202 BB101 CC01 345 EE09      3  202 345 AA10 BB101 CC01 EE09
3       AA13 103 BB101 CC02 123      4       103 123 AA13 BB101 CC02
4       AA14 203 BB101 CC02 456      5       203 456 AA14 BB101 CC02
5       AA15 204 BB102 CC03 567      6       204 567 AA15 BB102 CC03
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!