summing two columns in a pandas dataframe

后端 未结 5 1704
南旧
南旧 2020-12-08 13:23

when I use this syntax it creates a series rather than adding a column to my new dataframe (sum). Please help.

My code:

sum = data[\'variance\'] = d         


        
相关标签:
5条回答
  • 2020-12-08 13:51

    Same think can be done using lambda function. Here I am reading the data from a xlsx file.

    import pandas as pd
    df = pd.read_excel("data.xlsx", sheet_name = 4)
    print df
    

    Output:

      cluster Unnamed: 1      date  budget  actual
    0       a 2014-01-01  00:00:00   11000   10000
    1       a 2014-02-01  00:00:00    1200    1000
    2       a 2014-03-01  00:00:00     200     100
    3       b 2014-04-01  00:00:00     200     300
    4       b 2014-05-01  00:00:00     400     450
    5       c 2014-06-01  00:00:00     700    1000
    6       c 2014-07-01  00:00:00    1200    1000
    7       c 2014-08-01  00:00:00     200     100
    8       c 2014-09-01  00:00:00     200     300
    

    Sum two columns into 3rd new one.

    df['variance'] = df.apply(lambda x: x['budget'] + x['actual'], axis=1)
    print df
    

    Output:

      cluster Unnamed: 1      date  budget  actual  variance
    0       a 2014-01-01  00:00:00   11000   10000     21000
    1       a 2014-02-01  00:00:00    1200    1000      2200
    2       a 2014-03-01  00:00:00     200     100       300
    3       b 2014-04-01  00:00:00     200     300       500
    4       b 2014-05-01  00:00:00     400     450       850
    5       c 2014-06-01  00:00:00     700    1000      1700
    6       c 2014-07-01  00:00:00    1200    1000      2200
    7       c 2014-08-01  00:00:00     200     100       300
    8       c 2014-09-01  00:00:00     200     300       500
    
    0 讨论(0)
  • 2020-12-08 13:54
    df['variance'] = df.loc[:,['budget','actual']].sum(axis=1)
    
    0 讨论(0)
  • 2020-12-08 13:56

    I think you've misunderstood some python syntax, the following does two assignments:

    In [11]: a = b = 1
    
    In [12]: a
    Out[12]: 1
    
    In [13]: b
    Out[13]: 1
    

    So in your code it was as if you were doing:

    sum = df['budget'] + df['actual']  # a Series
    # and
    df['variance'] = df['budget'] + df['actual']  # assigned to a column
    

    The latter creates a new column for df:

    In [21]: df
    Out[21]:
      cluster                 date  budget  actual
    0       a  2014-01-01 00:00:00   11000   10000
    1       a  2014-02-01 00:00:00    1200    1000
    2       a  2014-03-01 00:00:00     200     100
    3       b  2014-04-01 00:00:00     200     300
    4       b  2014-05-01 00:00:00     400     450
    5       c  2014-06-01 00:00:00     700    1000
    6       c  2014-07-01 00:00:00    1200    1000
    7       c  2014-08-01 00:00:00     200     100
    8       c  2014-09-01 00:00:00     200     300
    
    In [22]: df['variance'] = df['budget'] + df['actual']
    
    In [23]: df
    Out[23]:
      cluster                 date  budget  actual  variance
    0       a  2014-01-01 00:00:00   11000   10000     21000
    1       a  2014-02-01 00:00:00    1200    1000      2200
    2       a  2014-03-01 00:00:00     200     100       300
    3       b  2014-04-01 00:00:00     200     300       500
    4       b  2014-05-01 00:00:00     400     450       850
    5       c  2014-06-01 00:00:00     700    1000      1700
    6       c  2014-07-01 00:00:00    1200    1000      2200
    7       c  2014-08-01 00:00:00     200     100       300
    8       c  2014-09-01 00:00:00     200     300       500
    

    As an aside, you shouldn't use sum as a variable name as the overrides the built-in sum function.

    0 讨论(0)
  • 2020-12-08 13:57

    You could also use the .add() function:

     df.loc[:,'variance'] = df.loc[:,'budget'].add(df.loc[:,'actual'])
    
    0 讨论(0)
  • 2020-12-08 14:14

    If "budget" has any NaN values but you don't want it to sum to NaN then try:

    def fun (b, a):
        if math.isnan(b):
            return a
        else:
            return b + a
    
    f = np.vectorize(fun, otypes=[float])
    
    df['variance'] = f(df['budget'], df_Lp['actual'])
    
    0 讨论(0)
提交回复
热议问题