Forwarding arguments in a function with purrr::map_df

一曲冷凌霜 提交于 2021-02-08 04:28:06

问题


I am trying to create a function that reads in all sheets in an excel workbook using readxl::read_excel and binds them into a single data frame, and allows me to pass through additional arguments to read_excel. I can do the first part fine, but not the second part.

library(magrittr)

# example excel workbook with multiple sheets
path <- readxl::readxl_example("datasets.xlsx")

# function with simple forwarding
read_all <- function(path, ...) {

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(~ readxl::read_excel(path = path, sheet = .x, ...))

}

# errors with and without additional arguments
read_all(path)
read_all(path, skip = 5)

I should get back a single file, instead I get an error:

Error: Can't guess format of this cell reference: iris
In addition: Warning message: Cell reference follows neither the A1 nor R1C1 format. Example: iris NAs generated.

Without argument passing the function works fine:

# Function works without passing extra params
read_all_0 <- function(path) {

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(~ readxl::read_excel(path = path, sheet = .x))

}

read_all_0(path)

Argument passing works fine in a simple function without purrr::map_df

read_test <- function(path, ...) {

  path %>% readxl::read_excel(...)
}
read_test(path, skip = 10)

回答1:


A possible solution is to create a named function which only takes one argument and pass it to map so that the only argument is the vector/list you are looping over.

Applied to your problem a solution would look like this:

# function with forwarding
read_all <- function(path, ...) {

  # function within function that sets the arguments path and ellipsis as given and only leaves sheet to be determined
  read_xl <- function(sheet) {
    readxl::read_excel(path = path, sheet, ...)
  }

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(read_xl)

}

# this allows you to pass along arguments in the ellipsis correctly
read_all(path)
read_all(path, col_names = FALSE)

It seems this problem is stemming from an improper environment handling of the purrr::as_mapper function. To circumvent this, I suggested using an anonymous function in the comments. Apparently, the approach below works as well.

read_all <- function(path, ...) {

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(function(x) {
                      readxl::read_excel(path = path, sheet = x, ...)
                   })

}

To verify that it is really the as_mapper function that is causing the problem, we can rewrite the named function-in-function from above using as_mapper. This again yields errors with and without additional arguments in the ellipsis.

# function with forwarding
read_all <- function(path, ...) {

  # named mapper function
  read_xl <- purrr::as_mapper(~ readxl::read_excel(path = path, sheet = .x, ...))

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(read_xl)

} 

Update Knowing that as_mapper is causing the issue allows us to dig deeper into the problem. Now we can inspect in the RStudio debugger what is happening under the hood when running a simple mapper version of read_excel:

read_xl <- purrr::as_mapper(~ readxl::read_excel(path = .x, sheet = .y, ...))
debugonce(read_xl) 
read_xl(path, 1)

It seems that when the ellipsis is included in the mapper function, as_mapper maps the first argument not only to .x but also automatically to the ellipsis .... We can verify this by creating a simple mapper function paster taking two arguments .x and ....

paster <- purrr::as_mapper(~ paste0(.x, ...))
paster(1)
> [1] "11"
paster(2)
> [1] "22"

The question now is: is there another way we are supposed to use ellipsis in mapper functions or is this a bug.




回答2:


I would have thought the following would work:

read_all <- function(path, ...) {

  path %>%
    readxl::excel_sheets() %>%
    purrr::set_names() %>%
    map_df(~readxl::read_excel(path=path, sheet=.x), ...)

}

because the map family has a ... argument for passing additional arguments to the mapped function. However, the following code ignores the n_max argument and still returns all rows of the various data frames instead of a data frame with 8 rows (2 rows from each of the four sheets):

p <- readxl_example("datasets.xlsx")
read_all(p, n_max=2)

However, this works:

read_all <- function(path, ...) {

  path %>% 
    excel_sheets() %>% 
    set_names() %>%
    map_df(read_excel, path=path, ...)

}

p <- readxl_example("datasets.xlsx")
read_all(path=p, n_max=2)

In the above, path and any additional arguments in ... are passed to read_excel and (apparently) the sheet name (which would be the .x if we used it explicitly) is implicitly passed to the sheet argument, I guess because the path argument, which is first, has already been provided. I don't really get this and it doesn't seem like a particularly transparent approach, but I thought I'd put it out there in case someone else can explain what's going on and provide better code.



来源:https://stackoverflow.com/questions/58243710/forwarding-arguments-in-a-function-with-purrrmap-df

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!