dplyr group by, carry forward value from previous group to next

后端 未结 6 2110
小蘑菇
小蘑菇 2020-12-31 16:50

Ok this is the over all view of what i\'m trying to achieve with dplyr:

Using dplyr I am making calculations to form new columns.

initial.         


        
6条回答
  •  情话喂你
    2020-12-31 17:15

    This kind of use of first and last is very untidy, so we'll keep it for the latest step.

    First we build intermediate data, following your code, but adding some columns to join later at the right places. I'm not sure if you need to keep all columns, you won't need the second join if not.

    library(dplyr)
    library(tidyr)
    
    df1 <- df0 %>%
      dplyr::mutate(RunID = data.table::rleid(x.long)) %>%
      group_by(RunID) %>%
      mutate(RunID_f = ifelse(row_number()==1,RunID,NA)) %>%  #  for later merge
      mutate(RunID_l = ifelse(row_number()==n(),RunID,NA))    #  possibly unneeded
    

    Then we build summarized data, I refactored your code a bit as you see, because these operations "should" be rowwise.

    summarized_data <- df1 %>%
      filter(x.long !=0) %>%
      summarize_at(vars(close.x,inital.capital),c("first","last")) %>%
      mutate(x.long.share        = inital.capital_first / close.x_first,
             x.end.value         = x.long.share         * close.x_last,
             x.net.profit        = inital.capital_last - x.end.value,
             new.initial.capital = x.net.profit         + inital.capital_last,
             lagged.new.initial.capital = lag(new.initial.capital,1))
    
    # A tibble: 2 x 10
    #   RunID close.x_first inital.capital_first close.x_last inital.capital_last x.long.share x.end.value x.net.profit new.initial.capital lagged.new.initial.capital
    #                                                                                                               
    # 1     3         38.85                10000        38.13               10000     257.4003    9814.672     185.3282           10185.328                         NA
    # 2     5         33.03                10000        34.34               10000     302.7551   10396.609    -396.6091            9603.391                   10185.33
    

    Then we join our summarized table to the original, getting advantage of the trick of the firt step. The first join may be skipped if you don't need all columns.

    df2 <- df1 %>% ungroup %>%
      left_join(summarized_data %>% select(-lagged.new.initial.capital) ,by=c("RunID_l"="RunID")) %>%      # if you want the other variables, if not, skip the line
      left_join(summarized_data %>% select(RunID,lagged.new.initial.capital) ,by=c("RunID_f"="RunID")) %>%
      mutate(inital.capital = ifelse(is.na(lagged.new.initial.capital),inital.capital,lagged.new.initial.capital)) %>%
      select(close.x:inital.capital) # for readability here
    
    # # A tibble: 20 x 6
    # close.x x.long y.short x.short y.long inital.capital
    #                     
    #  1 37.9600     NA      NA      NA     NA       10000.00
    #  2 36.5200      0       0       0      0       10000.00
    #  3 38.3200      0       0       0      0       10000.00
    #  4 38.5504      0       0       0      0       10000.00
    #  5 38.1700      0       0       0      0       10000.00
    #  6 38.8500      1       1       0      0       10000.00
    #  7 38.5300      1       1       0      0       10000.00
    #  8 39.1300      1       1       0      0       10000.00
    #  9 38.1300      1       1       0      0       10000.00
    # 10 37.0100      0       0       1      1       10000.00
    # 11 36.1400      0       0       1      1       10000.00
    # 12 35.2700      0       0       1      1       10000.00
    # 13 35.1300      0       0       1      1       10000.00
    # 14 32.2000      0       0       1      1       10000.00
    # 15 33.0300      1       1       0      0       10185.33
    # 16 34.9400      1       1       0      0       10000.00
    # 17 34.5700      1       1       0      0       10000.00
    # 18 33.6000      1       1       0      0       10000.00
    # 19 34.3400      1       1       0      0       10000.00
    # 20 35.8600      0       0       1      1       10000.00
    

    data

    df<- read.table(text="close.x x.long  y.short x.short y.long  inital.capital  x.long.shares   x.end.value x.net.profit    new.initial.capital
    37.96   NA  NA  NA  NA  10000   NA  NA  NA  NA
    36.52   0   0   0   0   10000   0   0   0   0
    38.32   0   0   0   0   10000   0   0   0   0
    38.5504 0   0   0   0   10000   0   0   0   0
    38.17   0   0   0   0   10000   0   0   0   0
    38.85   1   1   0   0   10000   0   0   0   0
    38.53   1   1   0   0   10000   0   0   0   0
    39.13   1   1   0   0   10000   0   0   0   0
    38.13   1   1   0   0   10000   257.4002574 9814.671815 185.3281853 10185.32819
    37.01   0   0   1   1   10000   0   0   0   0
    36.14   0   0   1   1   10000   0   0   0   0
    35.27   0   0   1   1   10000   0   0   0   0
    35.13   0   0   1   1   10000   0   0   0   0
    32.2    0   0   1   1   10000   0   0   0   0
    33.03   1   1   0   0   10000   0   0   0   0
    34.94   1   1   0   0   10000   0   0   0   0
    34.57   1   1   0   0   10000   0   0   0   0
    33.6    1   1   0   0   10000   0   0   0   0
    34.34   1   1   0   0   10000   302.7550711 10396.60914 -396.6091432    9603.390857
    35.86   0   0   1   1   10000   0   0   0   0",stringsAsFactors=FALSE,header=TRUE)
    
    df0 <- df %>% select(close.x:inital.capital)
    

提交回复
热议问题