I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (currency, adj date
I'm freshly new with Pandas but I wanted to achieve the same thing, automatically avoiding column names with _x or _y and removing duplicate data. I finally did it by using this answer and this one from Stackoverflow
sales.csv
city;state;units
Mendocino;CA;1
Denver;CO;4
Austin;TX;2
revenue.csv
branch_id;city;revenue;state_id
10;Austin;100;TX
20;Austin;83;TX
30;Austin;4;TX
47;Austin;200;TX
20;Denver;83;CO
30;Springfield;4;I
merge.py import pandas
def drop_y(df):
# list comprehension of the cols that end with '_y'
to_drop = [x for x in df if x.endswith('_y')]
df.drop(to_drop, axis=1, inplace=True)
sales = pandas.read_csv('data/sales.csv', delimiter=';')
revenue = pandas.read_csv('data/revenue.csv', delimiter=';')
result = pandas.merge(sales, revenue, how='inner', left_on=['state'], right_on=['state_id'], suffixes=('', '_y'))
drop_y(result)
result.to_csv('results/output.csv', index=True, index_label='id', sep=';')
When executing the merge command I replace the _x suffix with an empty string and them I can remove columns ending with _y
output.csv
id;city;state;units;branch_id;revenue;state_id
0;Denver;CO;4;20;83;CO
1;Austin;TX;2;10;100;TX
2;Austin;TX;2;20;83;TX
3;Austin;TX;2;30;4;TX
4;Austin;TX;2;47;200;TX