doparallel

Inconsistent behaviour with tm_map transformation functions when using multiple cores

折月煮酒 提交于 2019-12-01 02:11:06
Another potential title for this post could be "When parallel processing in r, does the ratio between number of cores, loop chunk size and object size matter?" I have a corpus I am running some transformations on using tm package. Since the corpus is large I'm using parallel processing with doparallel package. Sometimes the transformations do the task, but sometimes they do not. For example, tm::removeNumbers() . The very first document in the corpus has a content value of "n417". So if preprocessing is successful then this doc will be transformed to just "n". Sample corpus is below for

doParallel “foreach” inconsistently inherits objects from parent environment: “Error in { : task 1 failed - ”could not find function…\"

天涯浪子 提交于 2019-11-30 17:28:46
I have a problem with foreach that I just can't figure out. The following code fails on two Windows computers I've tried, but succeeds on three Linux computers, all running the same versions of R and doParallel: library("doParallel") registerDoParallel(cl=2,cores=2) f <- function(){return(10)} g <- function(){ r = foreach(x = 1:4) %dopar% { return(x + f()) } return(r) } g() On these two Windows computers, the following error is returned: Error in { : task 1 failed - "could not find function "f"" However, this works just fine on the Linux computers, and also works just fine with %do% instead of

doParallel (package) foreach does not work for big iterations in R

半腔热情 提交于 2019-11-30 15:37:13
I'm running the following code (extracted from doParallel's Vignettes ) on a PC (OS Linux) with 4 and 8 physical and logical cores, respectively. Running the code with iter=1e+6 or less, every thing is fine and I can see from CPU usage that all cores are employed for this computation. However, with larger number of iterations (e.g. iter=4e+6 ), it seems parallel computing does not work in which case. When I also monitor the CPU usage, just one core is involved in computations (100% usage). Example1 require("doParallel") require("foreach") registerDoParallel(cores=8) x <- iris[which(iris[,5] !=

doParallel (package) foreach does not work for big iterations in R

元气小坏坏 提交于 2019-11-29 22:01:19
问题 I'm running the following code (extracted from doParallel's Vignettes) on a PC (OS Linux) with 4 and 8 physical and logical cores, respectively. Running the code with iter=1e+6 or less, every thing is fine and I can see from CPU usage that all cores are employed for this computation. However, with larger number of iterations (e.g. iter=4e+6 ), it seems parallel computing does not work in which case. When I also monitor the CPU usage, just one core is involved in computations (100% usage).

doParallel, cluster vs cores

喜夏-厌秋 提交于 2019-11-29 09:35:27
What is the difference between cluster and cores in registerDoParallel when using doParallel package? Is my understanding correct that on single machine these are interchangeable and I will get same results for : cl <- makeCluster(4) registerDoParallel(cl) and registerDoParallel(cores = 4) The only difference I see that makeCluster() has to be stopped explicitly using stopCluster() . Yes, it's right from software view. on single machine these are interchangeable and I will get same results. To understand 'cluster' and 'cores' clearly, I suggest to think from 'hardware' and 'software' level. In

the difference between doMC and doParallel in R

戏子无情 提交于 2019-11-28 16:39:16
What's the difference between doParallel and doMC in R concerning foreach function? doParallel supports windows, unix-like, while doMC supports unix-like only. In other words, why doParallel cannot replace doMC directly? Thank you. Update: doParallel is built on parallel , which is essentially a merger of multicore and snow and automatically uses the appropriate tool for your system. As a result, we can use doParallel to support multi systems. In other words, we can use doParallel to replace doMC . ref: http://michaeljkoontz.weebly.com/uploads/1/9/9/4/19940979/parallel.pdf BTW, what is the

doParallel, cluster vs cores

六眼飞鱼酱① 提交于 2019-11-27 23:31:39
问题 What is the difference between cluster and cores in registerDoParallel when using doParallel package? Is my understanding correct that on single machine these are interchangeable and I will get same results for : cl <- makeCluster(4) registerDoParallel(cl) and registerDoParallel(cores = 4) The only difference I see that makeCluster() has to be stopped explicitly using stopCluster() . 回答1: The behavior of doParallel::registerDoParallel(<numeric>) depends on the operating system, see print

Run RSelenium in parallel

六月ゝ 毕业季﹏ 提交于 2019-11-27 04:35:54
问题 How would i go about running RSelenium in parallel. The following is an example using rvest in parallel library(RSelenium) library(rvest) library(magrittr) library(foreach) library(doParallel) URLsPar <- c("http://www.example.com/", "http://s5.tinypic.com/n392s6_th.jpg", "http://s5.tinypic.com/jl1jex_th.jpg", "http://s6.tinypic.com/16abj1s_th.jpg", "http://s6.tinypic.com/2ymvpqa_th.jpg") (detectCores() - 1) %>% makeCluster %>% registerDoParallel ws <- foreach(x = 1:length(URLsPar), .packages