Is there any way to break out of a foreach loop?

前端 未结 6 1169
耶瑟儿~
耶瑟儿~ 2020-12-04 22:29

I am using the R package foreach() with %dopar% to do long (~days) calculations in parallel. I would like the ability to stop the entire set of cal

6条回答
  •  渐次进展
    2020-12-04 23:22

    It sounds like you want an impatient version of the "stop" error handling. You could implement that by writing a custom combine function, and arranging for foreach to call it as soon as each result is returned. To do that you need to:

    • Use a backend that supports calling combine on-the-fly, like doMPI or doRedis
    • Don't enable .multicombine
    • Set .inorder to FALSE
    • Set .init to something (like NULL)

    Here's an example that does that:

    library(foreach)
    parfun <- function(errval, n) {
      abortable <- function(errfun) {
        comb <- function(x, y) {
          if (inherits(y, 'error')) {
            warning('This will leave your parallel backend in an inconsistent state')
            errfun(y)
          }
          c(x, y)
        }
        foreach(i=seq_len(n), .errorhandling='pass', .export='errval',
                .combine='comb', .inorder=FALSE, .init=NULL) %dopar% {
          if (i == errval)
            stop('testing abort')
          Sys.sleep(10)
          i
        }
      }
      callCC(abortable)
    }
    

    Note that I also set the error handling to "pass" so foreach will call the combine function with an error object. The callCC function is used to return from the foreach loop regardless of the error handling used within foreach and the backend. In this case, callCC will call the abortable function, passing it a function object that is used force callCC to immediately return. By calling that function from the combine function we can escape from the foreach loop when we detect an error object, and have callCC return that object. See ?callCC for more information.

    You can actually use parfun without a parallel backend registered and verify that the foreach loop "breaks" as soon as it executes a task that throws an error, but that could take awhile since the tasks are executed sequentially. For example, this takes 20 seconds to execute if no backend is registered:

    print(system.time(parfun(3, 4)))
    

    When executing parfun in parallel, we need to do more than simply break out of the foreach loop: we also need to stop the workers, otherwise they will continue to compute their assigned tasks. With doMPI, the workers can be stopped using mpi.abort:

    library(doMPI)
    cl <- startMPIcluster()
    registerDoMPI(cl)
    r <- parfun(getDoParWorkers(), getDoParWorkers())
    if (inherits(r, 'error')) {
      cat(sprintf('Caught error: %s\n', conditionMessage(r)))
      mpi.abort(cl$comm)
    }
    

    Note that the cluster object can't be used after the loop aborts, because things weren't properly cleaned up, which is why the normal "stop" error handling doesn't work this way.

提交回复
热议问题