doParallel “foreach” inconsistently inherits objects from parent environment: “Error in { : task 1 failed - ”could not find function…"

后端 未结 2 1216
梦谈多话
梦谈多话 2021-01-04 01:54

I have a problem with foreach that I just can\'t figure out. The following code fails on two Windows computers I\'ve tried, but succeeds on three Linux computers, all runnin

2条回答
  •  难免孤独
    2021-01-04 02:14

    @Tensibai is right. When trying to use doParallel on Windows, you have to "export" the functions that you want to use that are not in the current scope. In my experience, the way I've made this work is with the following (redacted) example.

    format_number <- function(data) {
      # do stuff that requires stringr
    }
    
    format_date_time <- function(data) {
      # do stuff that requires stringr
    }
    
    add_direction_data <- function(data) {
      # do stuff that requires dplyr
    }
    
    parse_data <- function(data) {
      voice_start <- # vector of values
      voice_end <- # vector of values
      target_phone_numbers <- # vector of values
      parse_voice_block <- function(block_start, block_end, number) {
        # do stuff
      }
    
      number_of_cores <- parallel::detectCores() - 1
      clusters <- parallel::makeCluster(number_of_cores)
      doParallel::registerDoParallel(clusters)
      data_list <- foreach(i = 1:length(voice_start), .combine=list,
                           .multicombine=TRUE, 
                           .export = c("format_number", "format_date_time", "add_direction_data"), 
                           .packages = c("dplyr", "stringr")) %dopar% 
                           parse_voice_block(voice_start[i], voice_end[i], target_phone_numbers[i])
      doParallel::stopCluster(clusters)
      output <- plyr::rbind.fill(data_list)
    }
    

    Since the first three functions aren't included in my current environment, doParallel would ignore them when firing up the new instances of R, but it would know where to find parse_voice_block since it's within the current scope. In addition, you need to specify what packages should be loaded in each new instance of R. As Tensibai stated, this is because you're not running forking the process, but instead firing up multiple instances of R and running commands simultaneously.

提交回复
热议问题