Apply data frame with list-variable of multivariable functions to a data frame with function arguments

问题

This dataframe contains what I'll call the "data":

library(tidyverse)
df_d <- data_frame(key = c("cat", "cat", "dog", "dog"), 
               value_1 = c(1,2,3,4), 
               value_2 = c(2,4,6,8))

Here is a dataframe that I intend to use as something like a function look-up table. f is a single variable function and f2 is a multivariable function:

df_f <- data_frame(key = c("cat", "dog"),
               f = c(function(x) x^2, function(x) sqrt(x)),
               f2 = c(function(x) (x[1]+x[2])^2, function(x) sqrt(x[1]+x[2])))

I can easily make a dataframe so that any cat row gets the cat functions and any dog row gets the dog functions:

df_both <- left_join(df_d, df_f)

I was able to figure out how to apply each of the f functions to, say, the value_1 column to get:

df_both %>% mutate(result = invoke_map_dbl(f, value_1))        
#> # A tibble: 4 x 6
#>   key   value_1 value_2 f      f2     result
#>   <chr>   <dbl>   <dbl> <list> <list>  <dbl>
#> 1 cat      1.00    2.00 <fn>   <fn>     1.00
#> 2 cat      2.00    4.00 <fn>   <fn>     4.00
#> 3 dog      3.00    6.00 <fn>   <fn>     1.73
#> 4 dog      4.00    8.00 <fn>   <fn>     2.00

My question is: how can I create a columns result2 that takes each function in f2 and uses as its input c(value_1, value_2). If re-defining the functions in f2 to be explicitly functions of two variables makes things much easier, that's fine too.

Desired output:

#> # A tibble: 4 x 7
#>   key   value_1 value_2 f      f2     result result2
#>   <chr>   <dbl>   <dbl> <list> <list>  <dbl>   <dbl>
#> 1 cat      1.00    2.00 <fn>   <fn>     1.00    9.00
#> 2 cat      2.00    4.00 <fn>   <fn>     4.00   36.0 
#> 3 dog      3.00    6.00 <fn>   <fn>     1.73    3.00
#> 4 dog      4.00    8.00 <fn>   <fn>     2.00    3.46

(Question motivated by an unfortunately self-deleted question from earlier today.)

回答1:

"If re-defining the functions in f2 to be explicitly functions of two variables makes things much easier, that's fine too."

Yes, that would be a more natural situation here, I think. Otherwise data is stored rowwise, and should possibly be reshaped.

Redefining your functions:

df_f <- data_frame(key = c("cat", "dog"),
                   f = c(function(x) x^2, function(x) sqrt(x)),
                   f2 = c(function(x, y) (x + y)^2, function(x, y) sqrt(x + y)))
df_both <- left_join(df_d, df_f)

Now you again use map_invoke, passing .x as a list, although you need to turn the lists inside out using transpose:

mutate(
  df_both,
  result  = invoke_map_dbl(f, value_1),
  result2 = invoke_map_dbl(f2, transpose(list(value_1, value_2)))
)

# A tibble: 4 x 7
  key   value_1 value_2 f      f2     result result2
  <chr>   <dbl>   <dbl> <list> <list>  <dbl>   <dbl>
1 cat        1.      2. <fn>   <fn>     1.00    9.00
2 cat        2.      4. <fn>   <fn>     4.00   36.0 
3 dog        3.      6. <fn>   <fn>     1.73    3.00
4 dog        4.      8. <fn>   <fn>     2.00    3.46

A set of three argument functions would then simply extend to invoke_map_dbl(f3, transpose(list(value_1, value_2, value_3))

Note that this kind of approach will not work well on large datasets, since you aren't using vectorization.

A more scalable alternative may involve nesting, where you at least apply each function once within each group:

df_both %>% 
  group_by(key) %>% 
  nest() %>% 
  mutate(data = map(
    data, 
    ~mutate(., result = first(f)(value_1), result2 = first(f2)(value_1, value_2))
    )) %>% 
  unnest()

Which gives the same result.

回答2:

We could use pmap

df_both %>% 
   mutate(result = invoke_map_dbl(f, value_1), 
          result2 = pmap_dbl(.[c('value_1', 'value_2', 'f2')],  ~(..3)(c(..1, ..2))))
# A tibble: 4 x 7
#   key   value_1 value_2 f      f2     result result2
#   <chr>   <dbl>   <dbl> <list> <list>  <dbl>   <dbl>
#1 cat      1.00    2.00 <fun>  <fun>    1.00    9.00
#2 cat      2.00    4.00 <fun>  <fun>    4.00   36.0 
#3 dog      3.00    6.00 <fun>  <fun>    1.73    3.00
#4 dog      4.00    8.00 <fun>  <fun>    2.00    3.46

Here, we don't change the OP's functions. It is the same as in the OP's post.

来源：https://stackoverflow.com/questions/49266076/apply-data-frame-with-list-variable-of-multivariable-functions-to-a-data-frame-w

标签

purrr