Reference values in the previous row with map or apply

后端 未结 2 1211
别那么骄傲
别那么骄傲 2021-01-23 09:12

Given a dataframe df, I would like to generate a new variable/column for each row based on the values in the previous row. df is sorted so that the ord

2条回答
  •  谎友^
    谎友^ (楼主)
    2021-01-23 09:49

    You can use the dataframe 'apply' function and leverage the unused the 'kwargs' parameter to store the previous row.

    import pandas as pd
    
    df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})
    
    new_col = 'c'
    
    def apply_func_decorator(func):
        prev_row = {}
        def wrapper(curr_row, **kwargs):
            val = func(curr_row, prev_row)
            prev_row.update(curr_row)
            prev_row[new_col] = val
            return val
        return wrapper
    
    @apply_func_decorator
    def running_total(curr_row, prev_row):
        return curr_row['a'] + curr_row['b'] + prev_row.get('c', 0)
    
    df[new_col] = df.apply(running_total, axis=1)
    
    print(df)
    # Output will be:
    #    a   b   c
    # 0  0   0   0
    # 1  1  10  11
    # 2  2  20  33
    

    This example uses a decorator to store the previous row in a dictionary and then pass it to the function when Pandas calls it on the next row.

    Disclaimer 1: The 'prev_row' variable starts off empty for the first row so when using it in the apply function I had to supply a default value to avoid a 'KeyError'.

    Disclaimer 2: I am fairly certain this will be slower the apply operation but I did not do any tests to figure out how much.

提交回复
热议问题