问题
I have got 3 data frames: energy, GDP & ScimEn. All the data frames have a column 'Country' and I merged all 3 data frames while using inner join:
a = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='inner')
b = pd.merge(a,ScimEn,left_on='Country',right_on='Country',how='inner')
Now, I want to figure out the number of countries which were left out of this merge.
I tried the following formula, but it's giving me an error "ValueError: Cannot use name of an existing column for indicator column":
z = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='outer', indicator=True)
f = pd.merge(z,ScimEn,left_on='Country',right_on='Country',how='inner',indicator=True)
g = f.query('_merge != "both"').shape[0]
Can someone propose a solution?
回答1:
ValueError is due to the indicator=True
twice in the merging, by default when indicator is set as True
then _merge
column will be added to the dataframe.
>>> z.columns[z.columns.str.contains('_merge')]
Index(['_merge'], dtype='object')
Since the _merge
is already present in z dataframe
hence the ValueError for creating the next f dataframe
.
z = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='outer', indicator=True)
f = pd.merge(z,ScimEn,left_on='Country',right_on='Country',how='outer',indicator = 'merge1')
j = pd.merge(f,energy,left_on='Country',right_on='Country',how='outer',indicator = 'merge2')
j[(j['_merge'] != 'both') | (j['merge1']!='both') | (j['merge2']!='both') ].shape[0]
or
j.shape[0] - b.shape[0]
来源:https://stackoverflow.com/questions/56128825/how-to-identify-whats-not-in-the-inner-join-while-merging-3-data-frames