How to count overlap rows among multiple dataframes?

前端 未结 3 692
庸人自扰
庸人自扰 2021-01-19 15:44

I have a multiple dataframe like below.

df1 = pd.DataFrame({\'Col1\':[\"aaa\",\"ffffd\",\"ggg\"],\'Col2\':[\"bbb\",\"eee\",\"hhh\"],\'Col3\':\"ccc\",\"fff\",\"         


        
3条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-19 16:31

    Using pandas.concat and groupby:

    dfs = [df1,df2,df3]
    dfs = [d.assign(df='df%s' % n) for n, d in enumerate(dfs, start=1)]
    new_df = pd.concat(dfs).groupby(['Col1', 'Col2', 'Col3','df']).size().unstack(fill_value=0)
    print(new_df)
    

    Output:

    df              df1  df2  df3
    Col1 Col2 Col3               
    aaa  bbb  ccc     1    1    0
    ffffd  eee  fff     1    0    0
    ggg  hhh  iii     1    0    0
    ppp  ttt  qqq     0    0    1
    qqq  eee  www     0    1    1
    rrr  ttt  yyy     0    0    1
    zzz  xxx  yyy     0    1    1
    

提交回复
热议问题