I thought that generally speaking using %>% wouldn\'t have a noticeable effect on speed. But in this case it runs 4x slower.
library(dplyr
magrittr's pipe is coded around the concept of functional chain.
You can create one by starting with a dot : . %>% head() %>% dim(), it's a compact way of writing a function.
When using a standard pipe call such as iris %>% head() %>% dim(), the functional chain . %>% head() %>% dim() will still be computed first, causing an overhead.
The functional chain is a bit of a strange animal :
(. %>% head()) %>% dim
#> NULL
When you look at the call . %>% head() %>% dim() , it actually parses as `%>%`( `%>%`(., head()), dim()). Basically, sorting things out requires some manipulation that takes a bit of time.
Another thing that takes a bit of time is to handle the different cases of rhs such as in iris %>% head, iris %>% head(.), iris %>% {head(.)} etc, to insert a dot at the right place when relevant.
You can build a very fast pipe the following way :
`%.%` <- function (lhs, rhs) {
rhs_call <- substitute(rhs)
eval(rhs_call, envir = list(. = lhs), enclos = parent.frame())
}
It will be much faster than magrittr's pipe and will actually behave better with edge cases, but will require explicit dots and obviously won't support functional chains.
library(magrittr)
`%.%` <- function (lhs, rhs) {
rhs_call <- substitute(rhs)
eval(rhs_call, envir = list(. = lhs), enclos = parent.frame())
}
bench::mark(relative = T,
"%>%" =
1 %>% identity %>% identity() %>% (identity) %>% {identity(.)},
"%.%" =
1 %.% identity(.) %.% identity(.) %.% identity(.) %.% identity(.)
)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#>
#> 1 %>% 15.9 13.3 1 4.75 1
#> 2 %.% 1 1 17.0 1 1.60
Created on 2019-10-05 by the reprex package (v0.3.0)
Here it was clocked at being 13. times faster.
I included it in my experimental fastpipe package, named as %>>%.
Now, we can also leverage the power of functional chains directly with a simple change to your call :
dummy_data %>% group_by(id) %>% summarise_at('label', . %>% unique %>% list)
It will be much faster because the functional chain is only parsed once and then internally it just applies functions one after another in a loop, very close to your base solution. My fast pipe on the other hand still adds a small overhead due to the eval / substitute done for every loop instance and every pipe.
Here's a benchmark including those 2 new solutions :
microbenchmark::microbenchmark(
nopipe=dummy_data %>% group_by(id) %>% summarise(label = list(unique(label))),
magrittr=dummy_data %>% group_by(id) %>% summarise(label = label %>% unique %>% list),
functional_chain=dummy_data %>% group_by(id) %>% summarise_at('label', . %>% unique %>% list),
fastpipe=dummy_data %.% group_by(., id) %.% summarise(., label =label %.% unique(.) %.% list(.)),
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> nopipe 42.2388 42.9189 58.0272 56.34325 66.1304 80.5491 10 a
#> magrittr 512.5352 571.9309 625.5392 616.60310 670.3800 811.1078 10 b
#> functional_chain 64.3320 78.1957 101.0012 99.73850 126.6302 148.7871 10 a
#> fastpipe 66.0634 87.0410 101.9038 98.16985 112.7027 172.1843 10 a