suppose i have two df like below:
import pandas as pd
data_dic = {
\"a\": [0,0,1,2],
\"b\": [3,3,4,5],
\"c\": [6,7,8,9]
}
df1 = pd.DataFrame(dat
Use GroupBy.cumcount for counter columns in both DataFrames with merge by added column:
df1['g'] = df1.groupby(['a','b']).cumcount()
df2['g'] = df2.groupby(['a','b']).cumcount()
df = pd.merge(df1, df2, on=['a', 'b', 'g'] , how='inner')
print (df)
a b c g d
0 0 3 6 0 10
1 0 3 7 1 10
2 1 4 8 0 12
3 2 5 9 0 13
Difference with another solutions the best see in changed data in second df second 10 to 11 - it correct merge by first duplicate pair a, b from df1 with first a, b pais from second, similar for all duplicates and also for unique pairs:
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"d": [10,11,12,13]
}
df2 = pd.DataFrame(data_dic)
df1['g'] = df1.groupby(['a','b']).cumcount()
df2['g'] = df2.groupby(['a','b']).cumcount()
df = pd.merge(df1, df2, on=['a', 'b', 'g'] , how='inner')
print (df)
a b c g d
0 0 3 6 0 10
1 0 3 7 1 11
2 1 4 8 0 12
3 2 5 9 0 13