问题
I often find myself with several pandas dataframes in the following form:
import pandas as pd
df1 = pd.read_table('filename1.dat')
df2 = pd.read_table('filename2.dat')
df3 = pd.read_table('filename3.dat')
print(df1)
columnA first_values
name1 342
name2 822
name3 121
name4 3434
print(df2)
columnA second_values
name1 8
name2 1
name3 1
name4 2
print(df3)
columnA third_values
name1 910
name2 301
name3 132
name4 299
I would like to merge together each of these dataframes on 'columnA', giving
columnA first_values second_values third_values
name1 342 8 910
name2 822 1 301
name3 121 1 132
name4 3434 2 299
I normally resort to this hack:
merged1 = df1.merge(df2, on='columnA')
then
merged2 = df3.merge(merged1, on='columnA')
But this doesn't scale for many dataframes. What is the correct way to do this?
回答1:
You can set columnA as the index and concat (reset index at the end):
dfs = [df1, df2, df3]
pd.concat([df.set_index('columnA') for df in dfs], axis=1).reset_index()
Out:
columnA first_values second_values third_values
0 name1 342 8 910
1 name2 822 1 301
2 name3 121 1 132
3 name4 3434 2 299
回答2:
Assuming that the three dataframes have the same index, you could just add columns to get the desired dataframes and not worry about merging, like so,
import pandas as pd
#create the dataframe
colA = ['name1', 'name2', 'name3', 'name4']
first = [ 342, 822, 121, 3434]
second = [ 8,1,1,2]
third = [ 910,301,132, 299]
df1 = pd.DataFrame({'colA': colA, 'first': first})
df2 = pd.DataFrame({'colA': colA, 'second': second})
df3 = pd.DataFrame({'colA': colA, 'third': third})
df_merged = df1.copy()
df_merged['second']= df2.second
df_merged['third']= df3.third
print (df_merged.head())
colA first second third
0 name1 342 8 910
1 name2 822 1 301
2 name3 121 1 132
3 name4 3434 2 299
来源:https://stackoverflow.com/questions/38775588/how-can-i-merge-together-several-pandas-dataframes-on-a-certain-column-without