Concatenate distinct columns in two dataframes using pandas (and append similar columns)

China☆狼群 提交于 2019-12-29 06:30:17

问题


My question is closely related to Pandas Merge - How to avoid duplicating columns but not identical.

I want to concatenate the columns that are different in three dataframes. The dataframes have a column id, and some columns that are identical: Ex.

df1

id place name qty unit A 
1 NY    Tom   2  10   a
2 TK    Ron   3  15   a
3 Lon   Don   5  90   a
4 Hk    Sam   4  49   a

df2

id place name qty unit B 
1 NY    Tom   2  10   b
2 TK    Ron   3  15   b
3 Lon   Don   5  90   b
4 Hk    Sam   4  49   b

df3

id place name qty unit C D
1 NY    Tom   2  10   c d
2 TK    Ron   3  15   c d
3 Lon   Don   5  90   c d
4 Hk    Sam   4  49   c d

Result:

id place name qty unit A B C D
1 NY    Tom   2  10   a b c d
2 TK    Ron   3  15   a b c d
3 Lon   Don   5  90   a b c d
4 Hk    Sam   4  49   a b c d

The columns place, name, qty, and unit will always be part of the three dataframes, the names of columns that are different could vary (A,B,C,D in my example). The three dataframes have the same number of rows.

I have tried:

cols_to_use = df1.columns - df2.columns
dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')

The problem is that I get more rows than expected and columns renamed in the resulting dataframe (when using concat).


回答1:


Using reduce from functools

from functools import reduce
reduce(lambda left,right: pd.merge(left,right), [df1,df2,df3])
Out[725]: 
   id place name  qty  unit  A  B  C  D
0   1    NY  Tom    2    10  a  b  c  d
1   2    TK  Ron    3    15  a  b  c  d
2   3   Lon  Don    5    90  a  b  c  d
3   4    Hk  Sam    4    49  a  b  c  d



回答2:


You can use nested merge

merge_on = ['id','place','name','qty','unit']
df1.merge(df2, on = merge_on).merge(df3, on = merge_on)



    id  place   name    qty unit    A   B   C   D
0   1   NY      Tom     2   10      a   b   c   d
1   2   TK      Ron     3   15      a   b   c   d
2   3   Lon     Don     5   90      a   b   c   d
3   4   Hk      Sam     4   49      a   b   c   d



回答3:


Using concat with groupby and first:

pd.concat([df1, df2, df3], 1).groupby(level=0, axis=1).first()

   A  B  C  D  id name place  qty  unit
0  a  b  c  d   1  Tom    NY    2    10
1  a  b  c  d   2  Ron    TK    3    15
2  a  b  c  d   3  Don   Lon    5    90
3  a  b  c  d   4  Sam    Hk    4    49



回答4:


You can extract only those columns from df2 (and df3 similarly) which are not already present in df1. Then just use pd.concat to concatenate the data frames:

cols = [c for c in df2.columns if c not in df1.columns]
df = pd.concat([df1, df2[cols]], axis=1)


来源:https://stackoverflow.com/questions/52614977/concatenate-distinct-columns-in-two-dataframes-using-pandas-and-append-similar

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!