parallel-processing

Make use of all CPUs on SLURM

空扰寡人 提交于 2021-01-27 19:52:00
问题 I would like to run a job on the cluster. There are a different number of CPUs on different nodes and I have no idea which nodes will be assigned to me. What are the proper options so that the job can create as many tasks as CPUs on all nodes? #!/bin/bash -l #SBATCH -p normal #SBATCH -N 4 #SBATCH -t 96:00:00 srun -n 128 ./run 回答1: One dirty hack to achieve the objective is using the environment variables provided by the SLURM. For a sample sbatch file: #!/bin/bash #SBATCH --job-name=test

multidplyr : assign functions to cluster

北慕城南 提交于 2021-01-27 16:36:12
问题 (see working solution below) I want to use multidplyr to parallelize a function : calculs.R f <- function(x){ return(x+1) } main.R library(dplyr) library(multidplyr) source("calculs.R") d <- data.frame(a=1:1000,b=sample(1:2,1000),replace=T) result <- d %>% partition(b) %>% do(f(.)) %>% collect() I then get: Initialising 3 core cluster. Error in checkForRemoteErrors(lapply(cl, recvResult)) : 2 nodes produced errors; first error: could not find function "f" In addition: Warning message: group

multidplyr : assign functions to cluster

↘锁芯ラ 提交于 2021-01-27 16:33:34
问题 (see working solution below) I want to use multidplyr to parallelize a function : calculs.R f <- function(x){ return(x+1) } main.R library(dplyr) library(multidplyr) source("calculs.R") d <- data.frame(a=1:1000,b=sample(1:2,1000),replace=T) result <- d %>% partition(b) %>% do(f(.)) %>% collect() I then get: Initialising 3 core cluster. Error in checkForRemoteErrors(lapply(cl, recvResult)) : 2 nodes produced errors; first error: could not find function "f" In addition: Warning message: group

Kill an mpi process

狂风中的少年 提交于 2021-01-27 14:51:37
问题 I would like to know if there is a way that an MPI process send a kill signal to another MPI process? Or differently, is there a way to exit from an MPI environment graciously, when one of the process is still active? (i.e. mpi_abort() prints an error message). Thanks 回答1: No, this is not possible within an MPI application using the MPI library. Individual processes would not be aware of the location of the other processes, nor of the process IDs of the other processes - and there is nothing

Compiling Rcpp functions using ClusterEvalQ

 ̄綄美尐妖づ 提交于 2021-01-27 14:43:24
问题 I am working on a project that requires parallel processing in R, and I am new to the doparallel package. What I would like to do is use a parallelized foreach loop. Due to the nature of the problem, this foreach loop will need to be executed many times. The problem I am having is that I use cppfunction and cfunction within the loop. The current work around is to call clusterEvalQ() for the cluster and to compile the relevant functions. However, this is extremely slow (~10 seconds for 4 cores

Kill an mpi process

只谈情不闲聊 提交于 2021-01-27 14:40:35
问题 I would like to know if there is a way that an MPI process send a kill signal to another MPI process? Or differently, is there a way to exit from an MPI environment graciously, when one of the process is still active? (i.e. mpi_abort() prints an error message). Thanks 回答1: No, this is not possible within an MPI application using the MPI library. Individual processes would not be aware of the location of the other processes, nor of the process IDs of the other processes - and there is nothing

How to join a list of multiprocessing.Process() at the same time?

独自空忆成欢 提交于 2021-01-27 13:05:15
问题 Given a list() of running multiprocessing.Process -instances, how can I join on all of them and return as soon as one exits without a Process.join -timeout and looping? Example from multiprocessing import Process from random import randint from time import sleep def run(): sleep(randint(0,5)) running = [ Process(target=run) for i in range(10) ] for p in running: p.start() How can I block until at least one Process in p exits? What I don't want to do is: exit = False while not exit: for p in

Python read .json files from GCS into pandas DF in parallel

浪尽此生 提交于 2021-01-27 11:26:45
问题 TL;DR: asyncio vs multi-processing vs threading vs. some other solution to parallelize for loop that reads files from GCS, then appends this data together into a pandas dataframe, then writes to BigQuery... I'd like to make parallel a python function that reads hundreds of thousands of small .json files from a GCS directory, then converts those .jsons into pandas dataframes, and then writes the pandas dataframes to a BigQuery table. Here is a not-parallel version of the function: import gcsfs

Python read .json files from GCS into pandas DF in parallel

蹲街弑〆低调 提交于 2021-01-27 11:23:32
问题 TL;DR: asyncio vs multi-processing vs threading vs. some other solution to parallelize for loop that reads files from GCS, then appends this data together into a pandas dataframe, then writes to BigQuery... I'd like to make parallel a python function that reads hundreds of thousands of small .json files from a GCS directory, then converts those .jsons into pandas dataframes, and then writes the pandas dataframes to a BigQuery table. Here is a not-parallel version of the function: import gcsfs

Nested Parallelism : Why only the main thread runs and executes the parallel for loop four times?

时光怂恿深爱的人放手 提交于 2021-01-27 10:50:37
问题 My code: #include <cstdio> #include "omp.h" int main() { omp_set_num_threads(4); #pragma omp parallel { #pragma omp parallel for // Adding "parallel" is the cause of the problem, but I don't know how to explain it. for (int i = 0; i < 6; i++) { printf("i = %d, I am Thread %d\n", i, omp_get_thread_num()); } } return 0; } The output that I am getting: i = 0, I am Thread 0 i = 1, I am Thread 0 i = 2, I am Thread 0 i = 0, I am Thread 0 i = 1, I am Thread 0 i = 0, I am Thread 0 i = 1, I am Thread