parallel-processing

Parallel processing of big rasters in R (windows)

夙愿已清 提交于 2020-01-23 05:34:05
问题 I'm using the doSNOW package and more specifically the parLapply function to perform reclassification (and subsequently other operations) on a list of big raster datasets (OS: Windows x64). The code looks a little like this minimalistic example: library(raster) library(doSNOW) #create list containing test rasters x <- raster(ncol=10980,nrow=10980) x <- setValues(x,1:ncell(x)) list.x <- replicate( 9 , x ) #setting up cluster NumberOfCluster <- 8 cl <- makeCluster(NumberOfCluster)

parLapply from inside function copies data to nodes unexpectedly

耗尽温柔 提交于 2020-01-23 03:07:07
问题 I have a large list (~30GB) and functions as follows: cl <- makeCluster(24, outfile = "") Foo1 <- function(cl, largeList) { return(parLapply(cl, largeList, Bar)) } Bar1 <- function(listElement) { return(nrow(listElement)) } Foo2 <- function(cl, largeList, arg) { clusterExport(cl, list("arg"), envir = environment()) return(parLapply(cl, largeList, function(x) Bar(x, arg))) } Bar2 <- function(listElement, arg) { return(nrow(listElement)) } There are no issues with: Foo1(cl, largeList) Watching

Parallel Computing in R : how to use the cores

隐身守侯 提交于 2020-01-23 02:39:18
问题 I am currently trying parallel computing in R. I am trying to train a logistic ridge model , and I currently have 4 Cores on my computer. I would like to split my data set equally into 4 pieces, and use each core to train model (on the training data) and save the result of each core into a single vector . the problem is that i have no clue how to do it, right now I tried to parallel with the foreach package, but the problem is the each core sees the same training data. here is the code with

Python scikit learn n_jobs

孤街浪徒 提交于 2020-01-22 05:57:08
问题 This is not a real issue, but I'd like to understand: running sklearn from Anaconda distrib on a Win7 4 cores 8 GB system fitting a KMeans model on a 200.000 samples*200 values table. running with n-jobs = -1: (after adding the if __name__ == '__main__': line to my script) I see the script starting 4 processes with 10 threads each. Each process uses about 25% of the CPU (total: 100%). Seems to work as expected running with n-jobs = 1: stays on a single process (not a surprise), with 20

Most appropriate MPI_Datatype for “block decomposition”?

此生再无相见时 提交于 2020-01-22 03:00:29
问题 With the help from Jonathan Dursi and osgx, I've now done the "row decomposition" among the processes: row http://img535.imageshack.us/img535/9118/ghostcells.jpg Now, I'd like to try the "block decomposition" approach (pictured below): block http://img836.imageshack.us/img836/9682/ghostcellsblock.jpg How should one go about it? This time, the MPI_Datatype will be necessary, right? Which datatype would be most appropriate/easy to use? Or can it plausibly be done without a datatype? 回答1: You

parfor with Matlab “the variable __ in a parfor cannot be classified”

ぐ巨炮叔叔 提交于 2020-01-22 02:54:06
问题 So I am trying to call this function using a parfor (basically curve fitting using Fourier series through a vector in a parfor loop): function[coefnames,coef] = fourier_regression(vect_waves,n) coef = zeros(length(vect_waves)-n,18); current_coef = zeros(18,1); % All the terms of the fourier series x = 1:n; parpool_obj = parpool; parfor i=n:length(vect_waves) take_fourier = vect_waves(i-n+1:i); f = fit(x,take_fourier,'fourier8'); current_coef = coeffvalues(f); coef(i,1:length(current_coef)) =

Purpose of multiprocessing.Pool.apply and multiprocessing.Pool.apply_async

耗尽温柔 提交于 2020-01-21 19:33:09
问题 See example and execution result below: #!/usr/bin/env python3.4 from multiprocessing import Pool import time import os def initializer(): print("In initializer pid is {} ppid is {}".format(os.getpid(),os.getppid())) def f(x): print("In f pid is {} ppid is {}".format(os.getpid(),os.getppid())) return x*x if __name__ == '__main__': print("In main pid is {} ppid is {}".format(os.getpid(), os.getppid())) with Pool(processes=4, initializer=initializer) as pool: # start 4 worker processes result =

Purpose of multiprocessing.Pool.apply and multiprocessing.Pool.apply_async

柔情痞子 提交于 2020-01-21 19:31:13
问题 See example and execution result below: #!/usr/bin/env python3.4 from multiprocessing import Pool import time import os def initializer(): print("In initializer pid is {} ppid is {}".format(os.getpid(),os.getppid())) def f(x): print("In f pid is {} ppid is {}".format(os.getpid(),os.getppid())) return x*x if __name__ == '__main__': print("In main pid is {} ppid is {}".format(os.getpid(), os.getppid())) with Pool(processes=4, initializer=initializer) as pool: # start 4 worker processes result =

Fastest way to read large Excel xlsx files? To parallelize or not?

冷暖自知 提交于 2020-01-21 11:34:10
问题 My questions are: What is the fastest way to read large(ish) .xlsx Excel files into R? 10 to 200 MB xlsx files, with multiple sheets. Can some kind of parallel processing be used , e.g. each core reading a separate sheet of a multi-sheet Excel file? Is there any other kind of optimisation that can be performed? What I have understood (and what I haven't) so far: if reading from spinning disks, as I will, parallel processing may actually slow down the reading as multiple processes try to read

MPI not running in parallel in a FORTRAN code

这一生的挚爱 提交于 2020-01-21 09:52:13
问题 I am trying to install an OpenMPI on my Ubuntu (14.04) machine, and I thought that I had succeeded, because I can run codes with mpirun , but recently I have noticed that it's not truly running in parallel. I installed openmpi with the following options: ./configure CXX=g++ CC=gcc F77=gfortran \ F90=gfortran \ FC=gfortran \ --enable-mpi-f77 \ --enable-mpi-f90 \ --prefix=/opt/openmpi-1.6.5 make all sudo make install As I said, I have run a code ( not written by myself ) and it seemed to work