mclapply encounters errors depending on core id?

倾然丶 夕夏残阳落幕 提交于 2019-12-11 05:07:17

问题


I have a set of genes for which I need to calculate some coefficients in parallel. Coefficients are calculated inside GeneTo_GeneCoeffs_filtered that takes gene name as an input and returns the list of 2 data frames.

Having 100-length gene_array I ran this command with the different number of cores: 5, 6 and 7.

Coeffslist=mclapply(gene_array,GeneTo_GeneCoeffs_filtered,mc.cores = no_cores)

I encounter errors on different gene names depending on the number of cores assigned to mclapply.

Indexes of genes on which GeneTo_GeneCoeffs_filtered cannot return the list of data frames they have a pattern. In the case of 7 cores assigned to mclapply, it is 4, 11, 18, 25, ... 95 elements of gene_array (every 7th), and when R works with 6 cores indexes are 2, 8, 14,..., 98 (every 6th) and the same way with 5 cores - every 5th.

The most important thing is that they are different for these processes and it means that the problem is not in particular genes.

I suspect there might be "broken" core that cannot properly run my functions and only it generates this errors. Is there a way to trace back its id and exclude it from the list of cores that can be used by R?


回答1:


A close reading of mclapply's manpage reveals that this behavior is by design and it arises as result of interaction between:

(a)

"the input X is split into as many parts as there are cores (currently the values are spread across the cores sequentially, i.e. first value to core 1, second to core 2, ... (core + 1)-th value to core 1 etc.) and then one process is forked to each core and the results are collected."

(b)

a "try-error" object will be returned for all the values involved in the failure, even if not all of them failed.

In your case, by virtue of (a), your gene_array is spread "round-robin" style across the cores (with a gap of mc.cores between the indexes of successive elements), and by virtue of (b), if any gene_array element raises an error, you get back an error for each gene_array element sent to that core (having a gap of mc.cores between the indices of those elements).

I refreshed my understanding of this in an exchange yesterday with Simon Urbanek: https://stat.ethz.ch/pipermail/r-sig-hpc/2019-September/002098.html in which I also provide an error-handling approach yielding errors only for the indices that generate an error.

You can also get errors only for the indices that generate an error by passing mc.preschedule=FALSE.



来源:https://stackoverflow.com/questions/52745779/mclapply-encounters-errors-depending-on-core-id

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!