Pandas DataFrame, adding duplicate columns together

问题

I have this really large DataFrame which has duplicate columns, but the values under it are not. I want to merge the duplicate columns together and add the values.

This really large DataFrame is made by appending Series together, and that is where the duplication occurs.

       Py Java Ruby C  Ruby
2010    1   5   8   1   5
2011    5   5   1   9   8
2012    1   5   8   2   8
2013    6   3   8   1   9
2014    4   8   9   9   9

So I want to add both Ruby columns together to get this result:

       Py Java Ruby C  Ruby
2010    1   5   13  1   5
2011    5   5   9   9   8
2012    1   5   16  2   8
2013    6   3   17  1   9
2014    4   8   18  9   9

I am running python 2.7

回答1:

I would propose to use groupby:

df = df.groupby(axis=1, level=0).sum()

In order to make it work also for MultiIndex, one can do:

if df.columns.duplicated().any():
    all_levels = df.columns.nlevels
    if all_levels > 1:
        all_levels = range(all_levels)
    df = df.groupby(axis=1, level=all_levels).sum()

EDIT

Instead of using groupby, one can now simply do:

df = df.sum(axis=1, level=0)

Be aware of nans, which will be converted to 0 by above procedures. To avoid that, one could use either skipna=False or min_count=1 (depending on use case):

df = df.sum(axis=1, level=0, skipna=False)

回答2:

I'm not sure why you would want to save the old column of values if you are summing them so here's a way to do it that way:

df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z}, index=a)
df.columns = ['Ruby', 'Python', 'Ruby']
df['Ruby'] = df['Ruby'].sum(axis=1)
df = df.T.drop_duplicates()
df = df.T

With a starting data frame that looks like:

        Ruby  Python  Ruby
2010     1       2     1
2011     2       4     3
2012     3       6     5
2013     4       8     7
2014     5      10     9

and then becomes:

        Ruby  Python
2010     2       2
2011     5       4
2012     8       6
2013    11       8
2014    14      10

来源：https://stackoverflow.com/questions/28246014/pandas-dataframe-adding-duplicate-columns-together

标签

python

sum

duplicates