suppose i have two df like below:
import pandas as pd
data_dic = {
\"a\": [0,0,1,2],
\"b\": [3,3,4,5],
\"c\": [6,7,8,9]
}
df1 = pd.DataFrame(dat
You can remove the duplicated rows before merging
df = pd.merge(
df1.drop_duplicates(),
df2.drop_duplicates(),
on=['a', 'b'], how='inner'
)
print(df)
# a b c d
# 0 0 3 6 10
# 1 0 3 7 10
# 2 1 4 8 12
# 3 2 5 9 13
Use GroupBy.cumcount for counter columns in both DataFrames
with merge by added column:
df1['g'] = df1.groupby(['a','b']).cumcount()
df2['g'] = df2.groupby(['a','b']).cumcount()
df = pd.merge(df1, df2, on=['a', 'b', 'g'] , how='inner')
print (df)
a b c g d
0 0 3 6 0 10
1 0 3 7 1 10
2 1 4 8 0 12
3 2 5 9 0 13
Difference with another solutions the best see in changed data in second df second 10
to 11
- it correct merge by first duplicate pair a, b
from df1
with first a, b
pais from second, similar for all duplicates and also for unique pairs:
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"d": [10,11,12,13]
}
df2 = pd.DataFrame(data_dic)
df1['g'] = df1.groupby(['a','b']).cumcount()
df2['g'] = df2.groupby(['a','b']).cumcount()
df = pd.merge(df1, df2, on=['a', 'b', 'g'] , how='inner')
print (df)
a b c g d
0 0 3 6 0 10
1 0 3 7 1 11
2 1 4 8 0 12
3 2 5 9 0 13
You could also drop duplicates after the merge
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"c": [6,7,8,9]
}
df1 = pd.DataFrame(data_dic)
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"d": [10,10,12,13]
}
df2 = pd.DataFrame(data_dic)
df3 = pd.merge(df1, df2, how='inner', on=['a', 'b']).drop_duplicates()
df3:
a b c d
0 0 3 6 10
2 0 3 7 10
4 1 4 8 12
5 2 5 9 13