Efficient functional programming (using mapply) in R for a “naturally” procedural problem

后端 未结 5 1880
独厮守ぢ
独厮守ぢ 2021-01-07 11:43

A common use case in R (at least for me) is identifying observations in a data frame that have some characteristic that depends on the values in some subset of other observa

5条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-01-07 12:15

    The "most natural way" IMO is the split-lapply-rbind method. You start by split()-ting into a list of groups, then lapply() the processing rule (in this case removing the last row) and then rbind() them back together. It's all doable as a nested set of function calls. The inner two steps are illustrated here and the final one-liner is presented at the bottom:

    > lapply( split(raw, raw$WorkerId), function(x) x[-NROW(x),] )
    $`1`
      WorkerId Iteration
    1        1         1
    2        1         2
    3        1         3
    
    $`2`
      WorkerId Iteration
    5        2         1
    6        2         2
    7        2         3
    
    $`3`
       WorkerId Iteration
    9         3         1
    10        3         2
    11        3         3
    
    do.call(rbind,  lapply( split(raw, raw$WorkerId), function(x) x[-NROW(x),] ) ) 
    

    Hadley Wickham has developed a wide set of tools, the plyr package, that extend this strategy to a wider variety of tasks.

提交回复
热议问题