furrr

R: asynchronous parallel lapply

隐身守侯 提交于 2021-02-19 07:14:58
问题 The simplest way I've found so far to use a parallel lapply in R was through the following example code: library(parallel) library(pbapply) cl <- makeCluster(10) clusterExport(cl = cl, {...}) clusterEvalQ(cl = cl, {...}) results <- pblapply(1:100, FUN = function(x){rnorm(x)}, cl = cl) This has a very useful feature of providing a progress bar for the results, and is very easy to reuse the same code when no parallel computations are needed, by setting cl = NULL . However, one issue that I've

How can I configure future to download more files?

ⅰ亾dé卋堺 提交于 2021-01-28 05:40:36
问题 I have a lot of files I need to download. I am using download.file() function and furrr::map to download in parallel, with plan(strategy = "multicore") . Please advise how can I load more jobs for each future? Running on Ubuntu 18.04 with 8 cores. R version 3.5.3. The files can be txt, zip or any other format. Size varies in range of 5MB - 40MB each. 回答1: Using furrr works just fine. I think what you mean is furrr::future_map . Using multicore substantially increases the downloading speed (

fuzzy and exact match of two databases

∥☆過路亽.° 提交于 2020-12-13 03:40:13
问题 I have two databases. The first one has about 70k rows with 3 columns. the second one has 790k rows with 2 columns. Both databases have a common variable grantee_name . I want to match each row of the first database to one or more rows of the second database based on this grantee_name . Note that merge will not work because the grantee_name do not match perfectly. There are different spellings etc. So, I am using the fuzzyjoin package and trying the following: library("haven"); library(

fuzzy and exact match of two databases

时间秒杀一切 提交于 2020-12-13 03:38:25
问题 I have two databases. The first one has about 70k rows with 3 columns. the second one has 790k rows with 2 columns. Both databases have a common variable grantee_name . I want to match each row of the first database to one or more rows of the second database based on this grantee_name . Note that merge will not work because the grantee_name do not match perfectly. There are different spellings etc. So, I am using the fuzzyjoin package and trying the following: library("haven"); library(

What is furrr's “black magic?”

这一生的挚爱 提交于 2020-03-25 13:55:14
问题 I use the R package furrr for most of my parallelization needs, and basically never have issues with exporting things from my global environment to the cluster. Today I did and I have no idea why. The package documentation seems to describe the process by which global variables are sent to the clusters as "black magic." What is the black magic? The furrr::future_options documentation says: Global variables and packages By default, the future package will perform black magic to look up the