Is there a way to output the result of a pipeline at each step without doing it manually? (eg. without selecting and running only the selected chunks)
I often find myself running a pipeline line-by-line to remember what it was doing or when I am developing some analysis.
For example:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
sample_frac(0.1) %>%
summarise(res = mean(mpg))
# Source: local data frame [3 x 2]
#
# cyl res
# 1 4 33.9
# 2 6 18.1
# 3 8 18.7
I'd to select and run:
mtcars %>% group_by(cyl)
and then...
mtcars %>% group_by(cyl) %>% sample_frac(0.1)
and so on...
But selecting and CMD/CTRL+ENTER in RStudio leaves a more efficient method to be desired.
Can this be done in code?
Is there a function which takes a pipeline and runs/digests it line by line showing output at each step in the console and you continue by pressing enter like in demos(...) or examples(...) of package guides
It is easy with magrittr function chain. For example define a function my_chain with:
foo <- function(x) x + 1
bar <- function(x) x + 1
baz <- function(x) x + 1
my_chain <- . %>% foo %>% bar %>% baz
and get the final result of a chain as:
> my_chain(0)
[1] 3
You can get a function list with functions(my_chain)
and define a "stepper" function like this:
stepper <- function(fun_chain, x, FUN = print) {
f_list <- functions(fun_chain)
for(i in seq_along(f_list)) {
x <- f_list[[i]](x)
FUN(x)
}
invisible(x)
}
And run the chain with interposed print function:
stepper(my_chain, 0, print)
# [1] 1
# [1] 2
# [1] 3
Or with waiting for user input:
stepper(my_chain, 0, function(x) {print(x); readline()})
You can select which results to print by using the tee-operator (%T>%) and print(). The tee-operator is used exclusively for side-effects like printing.
# i.e.
mtcars %>%
group_by(cyl) %T>% print() %>%
sample_frac(0.1) %T>% print() %>%
summarise(res = mean(mpg))
Add print:
mtcars %>%
group_by(cyl) %>%
print %>%
sample_frac(0.1) %>%
print %>%
summarise(res = mean(mpg))
IMHO magrittr is mostly useful interactively, that is when I am exploring data or building a new formula/model.
In this cases, storing intermediate results in distinct variables is very time consuming and distracting, while pipes let me focus on data, rather than typing:
x %>% foo
## reason on results and
x %>% foo %>% bar
## reason on results and
x %>% foo %>% bar %>% baz
## etc.
The problem here is that I don't know in advance what the final pipe will be, like in @bergant.
Typing, as in @zx8754,
x %>% print %>% foo %>% print %>% bar %>% print %>% baz
adds to much overhead and, to me, defeats the whole purpose of magrittr.
Essentially magrittr lacks a simple operator that both prints and pipes results.
The good news is that it seems quite easy to craft one:
`%P>%`=function(lhs, rhs){ print(lhs); lhs %>% rhs }
Now you can print an pipe:
1:4 %P>% sqrt %P>% sum
## [1] 1 2 3 4
## [1] 1.000000 1.414214 1.732051 2.000000
## [1] 6.146264
I found that if one defines/uses a key bindings for %P>% and %>%, the prototyping workflow is very streamlined (see Emacs ESS or RStudio).
I wrote the package pipes that can do several things that might help :
- use
%P>%toprintthe output. - use
%ae>%to useall.equalon input and output. - use
%V>%to useViewon the output, it will open a viewer for each relevant step.
If you want to see some aggregated info you can try %summary>%, %glimpse>% or %skim>% which will use summary, tibble::glimpse or skimr::skim, or you can define your own pipe to show specific changes, using new_pipe
# devtools::install_github("moodymudskipper/pipes")
library(dplyr)
library(pipes)
res <- mtcars %P>%
group_by(cyl) %P>%
sample_frac(0.1) %P>%
summarise(res = mean(mpg))
#> group_by(., cyl)
#> # A tibble: 32 x 11
#> # Groups: cyl [3]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
#> sample_frac(., 0.1)
#> # A tibble: 3 x 11
#> # Groups: cyl [3]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
#> 2 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
#> 3 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> summarise(., res = mean(mpg))
#> # A tibble: 3 x 2
#> cyl res
#> <dbl> <dbl>
#> 1 4 26
#> 2 6 17.8
#> 3 8 18.7
res <- mtcars %ae>%
group_by(cyl) %ae>%
sample_frac(0.1) %ae>%
summarise(res = mean(mpg))
#> group_by(., cyl)
#> [1] "Attributes: < Names: 1 string mismatch >"
#> [2] "Attributes: < Length mismatch: comparison on first 2 components >"
#> [3] "Attributes: < Component \"class\": Lengths (1, 4) differ (string compare on first 1) >"
#> [4] "Attributes: < Component \"class\": 1 string mismatch >"
#> [5] "Attributes: < Component 2: Modes: character, list >"
#> [6] "Attributes: < Component 2: Lengths: 32, 2 >"
#> [7] "Attributes: < Component 2: names for current but not for target >"
#> [8] "Attributes: < Component 2: Attributes: < target is NULL, current is list > >"
#> [9] "Attributes: < Component 2: target is character, current is tbl_df >"
#> sample_frac(., 0.1)
#> [1] "Different number of rows"
#> summarise(., res = mean(mpg))
#> [1] "Cols in y but not x: `res`. "
#> [2] "Cols in x but not y: `qsec`, `wt`, `drat`, `hp`, `disp`, `mpg`, `carb`, `gear`, `am`, `vs`. "
res <- mtcars %V>%
group_by(cyl) %V>%
sample_frac(0.1) %V>%
summarise(res = mean(mpg))
# you'll have to test this one by yourself
来源:https://stackoverflow.com/questions/30119628/stepping-through-a-pipeline-with-intermediate-results