Assign intermediate output to temp variable as part of dplyr pipeline

前端未结

关注

 5  2277

Q: In an R dplyr pipeline, how can I assign some intermediate output to a temp variable for use further down the pipeline?

My approach below works. But it assigns in

相关标签:

5条回答

萌比男神i

2020-11-30 12:43
pipeR is a package that extends the capabilities of the pipe without adding different pipes (as magrittr does). To assign, you pass a variable name, quoted with ~ in parentheses as an element in your pipe:
```
library(dplyr)
library(pipeR)

df %>>%
  filter(b < 3) %>>%
  (~tmp) %>>% 
  mutate(b = b*2) %>>%
  bind_rows(tmp)
##   a b
## 1 A 2
## 2 B 4
## 3 A 1
## 4 B 2

tmp
##   a b
## 1 A 1
## 2 B 2
```
While the syntax is not terribly descriptive, pipeR is very well documented.
0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2020-11-30 12:51
I was interested in the question for the sake of debugging (wanting to save intermediate results so that I can inspect and manipulate them from the console without having to separate the pipeline into two pieces which is cumbersome. So, for my purposes, the only problem with the OP's solution original solution was that it was slightly verbose.

This as can be fixed by defining a helper function:
```
to_var <- function(., ..., env=.GlobalEnv) {
  var_name = quo_name(quos(...)[[1]])
  assign(var_name, ., envir=env)
  .
}
```
Which can then be used as follows:
```
df <- data.frame(a = LETTERS[1:3], b=1:3)
df %>%
  filter(b < 3) %>%
  to_var(tmp) %>%
  mutate(b = b*2) %>%
  bind_rows(tmp)
# tmp still exists here
```
That still uses the global environment, but you can also explicitly pass a more local environment as in the following example:
```
f <- function() {
    df <- data.frame(a = LETTERS[1:3], b=1:3)
    env = environment()
    df %>%
      filter(b < 3) %>%
      to_var(tmp, env=env) %>%
      mutate(b = b*2) %>%
      bind_rows(tmp)
}
f()
# tmp does not exist here
```
The problem with the accepted solution is that it didn't seem to work out of the box with tidyverse pipes. ~~G. Grothendieck's solution doesn't work for the debugging use case at all.~~ (update: see G. Grothendieck's comment below and his updated answer!)

Finally, the reason assign("tmp", .) %>% doesn't work is that the default 'envir' argument for assign() is the "current environment" (see documentation for assign) which is different at each stage of the pipeline. To see this, try inserting { print(environment()); . } %>% into the pipeline at various points and see that a different address is printed each time. (It is probably possible to tweak the definition of to_var so that the default is the grandparent environment instead.)
0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-11-30 12:54
You can generate the desired object at the location in the pipeline where it's needed. For example:
```
df %>% filter(b < 3) %>% mutate(b = b*2) %>%
  bind_rows(df %>% filter(b < 3))
```
This method avoids having to filter twice:
```
df %>%
  filter(b < 3) %>%
  bind_rows(., mutate(., b = b*2))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
天命终不由人

2020-11-30 13:05
This does not create an object in the global environment:
```
df %>% 
   filter(b < 3) %>% 
   { 
     { . -> tmp } %>% 
     mutate(b = b*2) %>% 
     bind_rows(tmp) 
   }
```
This can also be used for debugging if you use . ->> tmp instead of . -> tmp or insert this into the pipeline:
```
{ browser(); . } %>% 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-11-30 13:06
I often find the need to save an intermediate product in a pipeline. While my use case is typically to avoid duplicating filters for later splitting, manipulation and reassembly, the technique can work well here:
```
df %>%
  filter(b < 3) %>%
  {. ->> intermediateResult} %>%  # this saves intermediate 
  mutate(b = b*2) %>%
  bind_rows(intermediateResult)    
```
0 讨论(0)
发布评论:

提交评论
- 加载中...