Pandas Merge - How to avoid duplicating columns

前端 未结 5 1443
轮回少年
轮回少年 2020-11-28 02:23

I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (currency, adj date

5条回答
  •  孤街浪徒
    2020-11-28 02:55

    I'm freshly new with Pandas but I wanted to achieve the same thing, automatically avoiding column names with _x or _y and removing duplicate data. I finally did it by using this answer and this one from Stackoverflow

    sales.csv

        city;state;units
        Mendocino;CA;1
        Denver;CO;4
        Austin;TX;2
    

    revenue.csv

        branch_id;city;revenue;state_id
        10;Austin;100;TX
        20;Austin;83;TX
        30;Austin;4;TX
        47;Austin;200;TX
        20;Denver;83;CO
        30;Springfield;4;I
    

    merge.py import pandas

    def drop_y(df):
        # list comprehension of the cols that end with '_y'
        to_drop = [x for x in df if x.endswith('_y')]
        df.drop(to_drop, axis=1, inplace=True)
    
    
    sales = pandas.read_csv('data/sales.csv', delimiter=';')
    revenue = pandas.read_csv('data/revenue.csv', delimiter=';')
    
    result = pandas.merge(sales, revenue,  how='inner', left_on=['state'], right_on=['state_id'], suffixes=('', '_y'))
    drop_y(result)
    result.to_csv('results/output.csv', index=True, index_label='id', sep=';')
    

    When executing the merge command I replace the _x suffix with an empty string and them I can remove columns ending with _y

    output.csv

        id;city;state;units;branch_id;revenue;state_id
        0;Denver;CO;4;20;83;CO
        1;Austin;TX;2;10;100;TX
        2;Austin;TX;2;20;83;TX
        3;Austin;TX;2;30;4;TX
        4;Austin;TX;2;47;200;TX
    

提交回复
热议问题