how to identify whats NOT in the inner join while merging 3 data frames

隐身守侯 提交于 2021-02-11 15:40:25

问题


I have got 3 data frames: energy, GDP & ScimEn. All the data frames have a column 'Country' and I merged all 3 data frames while using inner join:

a = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='inner')
b = pd.merge(a,ScimEn,left_on='Country',right_on='Country',how='inner')

Now, I want to figure out the number of countries which were left out of this merge.

I tried the following formula, but it's giving me an error "ValueError: Cannot use name of an existing column for indicator column":

z = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='outer', indicator=True)
f = pd.merge(z,ScimEn,left_on='Country',right_on='Country',how='inner',indicator=True)
g = f.query('_merge != "both"').shape[0]

Can someone propose a solution?


回答1:


ValueError is due to the indicator=True twice in the merging, by default when indicator is set as True then _merge column will be added to the dataframe.

>>> z.columns[z.columns.str.contains('_merge')]
Index(['_merge'], dtype='object')

Since the _merge is already present in z dataframe hence the ValueError for creating the next f dataframe.

z = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='outer', indicator=True)
f = pd.merge(z,ScimEn,left_on='Country',right_on='Country',how='outer',indicator = 'merge1')
j = pd.merge(f,energy,left_on='Country',right_on='Country',how='outer',indicator = 'merge2')

j[(j['_merge'] != 'both') | (j['merge1']!='both')  | (j['merge2']!='both') ].shape[0]

or

j.shape[0] - b.shape[0] 


来源:https://stackoverflow.com/questions/56128825/how-to-identify-whats-not-in-the-inner-join-while-merging-3-data-frames

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!