mclapply

How to write efficient nested functions for parallelization?

爱⌒轻易说出口 提交于 2021-02-11 13:32:48
问题 I have a dataframe with two grouping variables class and group . For each class, I have a plotting task per group. Mostly, I have 2 levels per class and 500 levels per group . I'm using parallel package for parallelization and mclapply function for the iteration through class and group levels. I'm wondering which is the best way to write my iterations. I think I have two options: Run parallelization for class variable. Run parallelization for group variable. My computer has 3 cores working

mcapply: all scheduled cores encountered errors in user code

↘锁芯ラ 提交于 2021-02-09 09:22:17
问题 The following is my code. I am trying get the list of all the files (~20000) that end with .idat and read each file using the function illuminaio::readIDAT . library(illuminaio) library(parallel) library(data.table) # number of cores to use ncores = 8 # this gets all the files with .idat extension ~20000 files files <- list.files(path = './', pattern = "*.idat", full.names = TRUE) # function to read the idat file and create a data.table of filename, and two more columns # write out as csv

mcapply: all scheduled cores encountered errors in user code

被刻印的时光 ゝ 提交于 2021-02-09 09:21:33
问题 The following is my code. I am trying get the list of all the files (~20000) that end with .idat and read each file using the function illuminaio::readIDAT . library(illuminaio) library(parallel) library(data.table) # number of cores to use ncores = 8 # this gets all the files with .idat extension ~20000 files files <- list.files(path = './', pattern = "*.idat", full.names = TRUE) # function to read the idat file and create a data.table of filename, and two more columns # write out as csv

mcapply: all scheduled cores encountered errors in user code

╄→尐↘猪︶ㄣ 提交于 2021-02-09 09:21:06
问题 The following is my code. I am trying get the list of all the files (~20000) that end with .idat and read each file using the function illuminaio::readIDAT . library(illuminaio) library(parallel) library(data.table) # number of cores to use ncores = 8 # this gets all the files with .idat extension ~20000 files files <- list.files(path = './', pattern = "*.idat", full.names = TRUE) # function to read the idat file and create a data.table of filename, and two more columns # write out as csv

R mclapply vs foreach

江枫思渺然 提交于 2021-02-06 09:53:29
问题 I use mclapply for all my "embarassingly parallel" computations. I find it clean and easy to use, and when arguments mc.cores = 1 and mc.preschedule = TRUE I can insert browser() in the function inside mclapply and debug line by line just like in regular R. This is a huge help in getting code to production quicker. What does foreach offer that mclapply does not? Is there a reason I should consider writing foreach code going forward? If I understand correctly, both can use the multicore

Why don't parallel jobs print in RStudio?

心已入冬 提交于 2021-01-28 14:32:17
问题 Why do scripts parallelized with mclapply print on a cluster but not in RStudio? Just asking out of curiosity. mclapply(1:10, function(x) { print("Hello!") return(TRUE) }, mc.cores = 2) # Hello prints in slurm but not RStudio 回答1: None of the functions in the 'parallel' package guarantee proper displaying of output sent to the standard output (stdout) or the standard error (stderr) on workers. This is true for all types of parallelization approaches, e.g. forked processing ( mclapply() ), or

Should mclapply calls be nested?

时光毁灭记忆、已成空白 提交于 2021-01-28 06:40:28
问题 Is nesting parallel::mclapply calls a good idea? require(parallel) ans <- mclapply(1:3, function(x) mclapply(1:3, function(y) y * x)) unlist(ans) Outputs: [1] 1 2 3 2 4 6 3 6 9 So it's "working". But is it recommended for real compute-intensive tasks that outnumber the number of cores? what is going on when this is executed? Are the multiple forks involved more potentially wasteful? What are the considerations for mc.cores and mc.preschedule ? Edit Just to clarify the motivation, often it

mcmapply performance on multiple cores

落花浮王杯 提交于 2020-01-23 17:52:14
问题 I have a function which I want to run on around 3 million datapoints. I am trying to parallelise the function using mcmapply on a Ubuntu machine with 8 cores. The function takes in a list of length 3 million as well as 3 more vectors of length 3 million and 1 constant value cutoffyearmon . The code runs perfectly fine with 100000 rows of data within 2 minutes on a single core and throws no error. However, when I try to run the code in parallel on 6 cores of my machine using mcmapply it keeps

Parallel processing of big rasters in R (windows)

夙愿已清 提交于 2020-01-23 05:34:05
问题 I'm using the doSNOW package and more specifically the parLapply function to perform reclassification (and subsequently other operations) on a list of big raster datasets (OS: Windows x64). The code looks a little like this minimalistic example: library(raster) library(doSNOW) #create list containing test rasters x <- raster(ncol=10980,nrow=10980) x <- setValues(x,1:ncell(x)) list.x <- replicate( 9 , x ) #setting up cluster NumberOfCluster <- 8 cl <- makeCluster(NumberOfCluster)

parallel::mclapply() adds or removes bindings to the global environment. Which ones?

非 Y 不嫁゛ 提交于 2020-01-02 08:56:52
问题 Why this matters For drake, I want users to be able to execute mclapply() calls within a locked global environment. The environment is locked for the sake of reproducibility. Without locking, data analysis pipelines could invalidate themselves. Evidence that mclapply() adds or removes global bindings set.seed(0) a <- 1 # Works as expected. rnorm(1) #> [1] 1.262954 tmp <- parallel::mclapply(1:2, identity, mc.cores = 2) # No new bindings allowed. lockEnvironment(globalenv()) # With a locked