问题
My question is closely related to Pandas Merge - How to avoid duplicating columns but not identical.
I want to concatenate the columns that are different in three dataframes. The dataframes have a column id, and some columns that are identical: Ex.
df1
id place name qty unit A
1 NY Tom 2 10 a
2 TK Ron 3 15 a
3 Lon Don 5 90 a
4 Hk Sam 4 49 a
df2
id place name qty unit B
1 NY Tom 2 10 b
2 TK Ron 3 15 b
3 Lon Don 5 90 b
4 Hk Sam 4 49 b
df3
id place name qty unit C D
1 NY Tom 2 10 c d
2 TK Ron 3 15 c d
3 Lon Don 5 90 c d
4 Hk Sam 4 49 c d
Result:
id place name qty unit A B C D
1 NY Tom 2 10 a b c d
2 TK Ron 3 15 a b c d
3 Lon Don 5 90 a b c d
4 Hk Sam 4 49 a b c d
The columns place, name, qty, and unit will always be part of the three dataframes, the names of columns that are different could vary (A,B,C,D in my example). The three dataframes have the same number of rows.
I have tried:
cols_to_use = df1.columns - df2.columns
dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')
The problem is that I get more rows than expected and columns renamed in the resulting dataframe (when using concat).
回答1:
Using reduce
from functools
from functools import reduce
reduce(lambda left,right: pd.merge(left,right), [df1,df2,df3])
Out[725]:
id place name qty unit A B C D
0 1 NY Tom 2 10 a b c d
1 2 TK Ron 3 15 a b c d
2 3 Lon Don 5 90 a b c d
3 4 Hk Sam 4 49 a b c d
回答2:
You can use nested merge
merge_on = ['id','place','name','qty','unit']
df1.merge(df2, on = merge_on).merge(df3, on = merge_on)
id place name qty unit A B C D
0 1 NY Tom 2 10 a b c d
1 2 TK Ron 3 15 a b c d
2 3 Lon Don 5 90 a b c d
3 4 Hk Sam 4 49 a b c d
回答3:
Using concat
with groupby
and first
:
pd.concat([df1, df2, df3], 1).groupby(level=0, axis=1).first()
A B C D id name place qty unit
0 a b c d 1 Tom NY 2 10
1 a b c d 2 Ron TK 3 15
2 a b c d 3 Don Lon 5 90
3 a b c d 4 Sam Hk 4 49
回答4:
You can extract only those columns from df2
(and df3
similarly) which are not already present in df1
. Then just use pd.concat to concatenate the data frames:
cols = [c for c in df2.columns if c not in df1.columns]
df = pd.concat([df1, df2[cols]], axis=1)
来源:https://stackoverflow.com/questions/52614977/concatenate-distinct-columns-in-two-dataframes-using-pandas-and-append-similar