Pandas DataFrame, adding duplicate columns together

百般思念 提交于 2020-01-15 07:11:31

问题


I have this really large DataFrame which has duplicate columns, but the values under it are not. I want to merge the duplicate columns together and add the values.

This really large DataFrame is made by appending Series together, and that is where the duplication occurs.

       Py Java Ruby C  Ruby
2010    1   5   8   1   5
2011    5   5   1   9   8
2012    1   5   8   2   8
2013    6   3   8   1   9
2014    4   8   9   9   9

So I want to add both Ruby columns together to get this result:

       Py Java Ruby C  Ruby
2010    1   5   13  1   5
2011    5   5   9   9   8
2012    1   5   16  2   8
2013    6   3   17  1   9
2014    4   8   18  9   9

I am running python 2.7


回答1:


I would propose to use groupby:

df = df.groupby(axis=1, level=0).sum()

In order to make it work also for MultiIndex, one can do:

if df.columns.duplicated().any():
    all_levels = df.columns.nlevels
    if all_levels > 1:
        all_levels = range(all_levels)
    df = df.groupby(axis=1, level=all_levels).sum()

EDIT

Instead of using groupby, one can now simply do:

df = df.sum(axis=1, level=0)

Be aware of nans, which will be converted to 0 by above procedures. To avoid that, one could use either skipna=False or min_count=1 (depending on use case):

df = df.sum(axis=1, level=0, skipna=False)



回答2:


I'm not sure why you would want to save the old column of values if you are summing them so here's a way to do it that way:

df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z}, index=a)
df.columns = ['Ruby', 'Python', 'Ruby']
df['Ruby'] = df['Ruby'].sum(axis=1)
df = df.T.drop_duplicates()
df = df.T

With a starting data frame that looks like:

        Ruby  Python  Ruby
2010     1       2     1
2011     2       4     3
2012     3       6     5
2013     4       8     7
2014     5      10     9

and then becomes:

        Ruby  Python
2010     2       2
2011     5       4
2012     8       6
2013    11       8
2014    14      10


来源:https://stackoverflow.com/questions/28246014/pandas-dataframe-adding-duplicate-columns-together

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!