Combine duplicated columns within a DataFrame

前端 未结 3 1453
礼貌的吻别
礼貌的吻别 2020-12-08 10:19

If I have a dataframe that has columns that include the same name, is there a way to combine the columns that have the same name with some sort of function (i.e. sum)?

3条回答
  •  清歌不尽
    2020-12-08 10:56

    Here is possible simplier solution for common aggregation functions like sum, mean, median, max, min, std - only use parameters axis=1 for working with columns and level:

    #coldspeed samples
    np.random.seed(0)
    df = pd.DataFrame(np.random.choice(50, (5, 5)), columns=list('AABBB'))
    print (df)
    
    print (df.sum(axis=1, level=0))
        A    B
    0  91    6
    1  48   76
    2  29   60
    3  39  108
    4  41   75
    
    df.columns = pd.MultiIndex.from_arrays([['one']*3 + ['two']*2, df.columns])
    
    print (df.sum(axis=1, level=1))
        A    B
    0  91    6
    1  48   76
    2  29   60
    3  39  108
    4  41   75
    
    print (df.sum(axis=1, level=[0,1]))
      one     two
        A   B   B
    0  91   0   6
    1  48  19  57
    2  29  24  36
    3  39  39  69
    4  41  37  38
    

    Similar it working for index, then use axis=0 instead axis=1:

    np.random.seed(0)
    df = pd.DataFrame(np.random.choice(50, (5, 5)), columns=list('ABCDE'), index=list('aabbc'))
    print (df)
        A   B   C   D   E
    a  44  47   0   3   3
    a  39   9  19  21  36
    b  23   6  24  24  12
    b   1  38  39  23  46
    c  24  17  37  25  13
    
    print (df.min(axis=0, level=0))
        A   B   C   D   E
    a  39   9   0   3   3
    b   1   6  24  23  12
    c  24  17  37  25  13
    
    df.index = pd.MultiIndex.from_arrays([['bar']*3 + ['foo']*2, df.index])
    
    print (df.mean(axis=0, level=1))
          A     B     C     D     E
    a  41.5  28.0   9.5  12.0  19.5
    b  12.0  22.0  31.5  23.5  29.0
    c  24.0  17.0  37.0  25.0  13.0
    
    print (df.max(axis=0, level=[0,1]))
            A   B   C   D   E
    bar a  44  47  19  21  36
        b  23   6  24  24  12
    foo b   1  38  39  23  46
        c  24  17  37  25  13
    

    If need use another functions like first, last, size, count is necessary use coldspeed answer

提交回复
热议问题