问题
I have this really large DataFrame which has duplicate columns, but the values under it are not. I want to merge the duplicate columns together and add the values.
This really large DataFrame is made by appending Series together, and that is where the duplication occurs.
Py Java Ruby C Ruby
2010 1 5 8 1 5
2011 5 5 1 9 8
2012 1 5 8 2 8
2013 6 3 8 1 9
2014 4 8 9 9 9
So I want to add both Ruby columns together to get this result:
Py Java Ruby C Ruby
2010 1 5 13 1 5
2011 5 5 9 9 8
2012 1 5 16 2 8
2013 6 3 17 1 9
2014 4 8 18 9 9
I am running python 2.7
回答1:
I would propose to use groupby:
df = df.groupby(axis=1, level=0).sum()
In order to make it work also for MultiIndex, one can do:
if df.columns.duplicated().any():
all_levels = df.columns.nlevels
if all_levels > 1:
all_levels = range(all_levels)
df = df.groupby(axis=1, level=all_levels).sum()
EDIT
Instead of using groupby, one can now simply do:
df = df.sum(axis=1, level=0)
Be aware of nans, which will be converted to 0 by above procedures. To avoid that, one could use either skipna=False
or min_count=1
(depending on use case):
df = df.sum(axis=1, level=0, skipna=False)
回答2:
I'm not sure why you would want to save the old column of values if you are summing them so here's a way to do it that way:
df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z}, index=a)
df.columns = ['Ruby', 'Python', 'Ruby']
df['Ruby'] = df['Ruby'].sum(axis=1)
df = df.T.drop_duplicates()
df = df.T
With a starting data frame that looks like:
Ruby Python Ruby
2010 1 2 1
2011 2 4 3
2012 3 6 5
2013 4 8 7
2014 5 10 9
and then becomes:
Ruby Python
2010 2 2
2011 5 4
2012 8 6
2013 11 8
2014 14 10
来源:https://stackoverflow.com/questions/28246014/pandas-dataframe-adding-duplicate-columns-together