Concatenate distinct columns in two dataframes using pandas (and append similar columns)

问题

My question is closely related to Pandas Merge - How to avoid duplicating columns but not identical.

I want to concatenate the columns that are different in three dataframes. The dataframes have a column id, and some columns that are identical: Ex.

df1

id place name qty unit A 
1 NY    Tom   2  10   a
2 TK    Ron   3  15   a
3 Lon   Don   5  90   a
4 Hk    Sam   4  49   a

df2

id place name qty unit B 
1 NY    Tom   2  10   b
2 TK    Ron   3  15   b
3 Lon   Don   5  90   b
4 Hk    Sam   4  49   b

df3

id place name qty unit C D
1 NY    Tom   2  10   c d
2 TK    Ron   3  15   c d
3 Lon   Don   5  90   c d
4 Hk    Sam   4  49   c d

Result:

id place name qty unit A B C D
1 NY    Tom   2  10   a b c d
2 TK    Ron   3  15   a b c d
3 Lon   Don   5  90   a b c d
4 Hk    Sam   4  49   a b c d

The columns place, name, qty, and unit will always be part of the three dataframes, the names of columns that are different could vary (A,B,C,D in my example). The three dataframes have the same number of rows.

I have tried:

cols_to_use = df1.columns - df2.columns
dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')

The problem is that I get more rows than expected and columns renamed in the resulting dataframe (when using concat).

回答1:

Using reduce from functools

from functools import reduce
reduce(lambda left,right: pd.merge(left,right), [df1,df2,df3])
Out[725]: 
   id place name  qty  unit  A  B  C  D
0   1    NY  Tom    2    10  a  b  c  d
1   2    TK  Ron    3    15  a  b  c  d
2   3   Lon  Don    5    90  a  b  c  d
3   4    Hk  Sam    4    49  a  b  c  d

回答2:

You can use nested merge

merge_on = ['id','place','name','qty','unit']
df1.merge(df2, on = merge_on).merge(df3, on = merge_on)



    id  place   name    qty unit    A   B   C   D
0   1   NY      Tom     2   10      a   b   c   d
1   2   TK      Ron     3   15      a   b   c   d
2   3   Lon     Don     5   90      a   b   c   d
3   4   Hk      Sam     4   49      a   b   c   d

回答3:

Using concat with groupby and first:

pd.concat([df1, df2, df3], 1).groupby(level=0, axis=1).first()

   A  B  C  D  id name place  qty  unit
0  a  b  c  d   1  Tom    NY    2    10
1  a  b  c  d   2  Ron    TK    3    15
2  a  b  c  d   3  Don   Lon    5    90
3  a  b  c  d   4  Sam    Hk    4    49

回答4:

You can extract only those columns from df2 (and df3 similarly) which are not already present in df1. Then just use pd.concat to concatenate the data frames:

cols = [c for c in df2.columns if c not in df1.columns]
df = pd.concat([df1, df2[cols]], axis=1)

来源：https://stackoverflow.com/questions/52614977/concatenate-distinct-columns-in-two-dataframes-using-pandas-and-append-similar

标签

python

pandas

dataframe

merge

concat