hpc

Optimization: correct way to pass large array to map using pathos multiprocessing

ぐ巨炮叔叔 提交于 2019-12-11 17:09:01
问题 I need to perform a function on elements of a large array and I want to optimize the code using multiprocessing so that I can utilize all the cores on a supercomputer. This is a follow up to the question I asked here. I used the code: import numpy as np from scipy import misc, ndimage import itertools from pathos.multiprocessing import ProcessPool import time start = time.time() #define the original array a as a= np.load('100by100by100array.npy') n= a.ndim #number of dimensions imx, imy, imz

Does Python's datatable package support out-of-memory datasets?

ぐ巨炮叔叔 提交于 2019-12-11 15:56:12
问题 datatable is a relatively fresh high performance DataFrame/data.table alternative for Python. The datatable documentation states: It focuses on: big data support, high performance, both in-memory and out-of-memory datasets, and multi-threaded algorithms. Still, haven't found operations related to caching or keeping a part of the data out-of-memory. In what sense does it support out-of-memory datasets? 来源: https://stackoverflow.com/questions/56572117/does-pythons-datatable-package-support-out

Single R script on multiple nodes

两盒软妹~` 提交于 2019-12-11 12:25:47
问题 I would like to utilize CPU cores from multiple nodes to execute a single R script. Each node contains 16 cores and are assigned to me via a Slurm tool. So far my code looks like the following: ncores <- 16 List_1 <- list(...) List_2 <- list(...) cl <- makeCluster(ncores) registerDoParallel(cl) getDoParWorkers() foreach(L_1=List_1) %:% foreach(L_2=List_2) %dopar% { ... } stopCluster(cl) I execute it via the following command in a UNIX shell: mpirun -np 1 R --no-save < file_path_R_script.R >

Optimization: alternatives to passing large array to map in ipyparallel?

纵然是瞬间 提交于 2019-12-11 07:56:20
问题 I originally wrote a nested for loop over a test 3D array in python. As I wanted to apply it to larger array which would take a lot more time, I decided to parallelise using ipyparallel by writing it as a function and using bview.map. This way I could take advantage of multiple cores/nodes on a supercomputer. However the code is actually slower when sent to the supercomputer. When I profiled, it appears the time is spent mostly on method 'acquire' of 'thread.lock' objects which from other

How to run Catalyst/Paraview code examples?

守給你的承諾、 提交于 2019-12-11 07:09:17
问题 Hi I'm trying to figure out catalist and paraview for a while. I tried to run these examples on my paraview but without success. https://github.com/Kitware/ParaViewCatalystExampleCode I imagined at least the python code would run with the python shell. But it doesn't seem to work either. I viewed all the kitware tutorials and some others online. But still no progress. Any help is appreciated. 回答1: You should be able to run all of the non-Python examples with CTest (i.e. ctest executable). I

Error occurred in MPI_Send on communicator MPI_COMM_WORLD MPI_ERR_RANK:invalid rank

荒凉一梦 提交于 2019-12-11 05:24:33
问题 I am trying to learn MPI. When I am sending data from 1 processor to another, I am successfully able to send the data and receive it in the other in a variable. But, when I try to send and receive on both the processors I get the invalid rank error. Here is my code for the program #include <mpi.h> #include <stdio.h> #include <unistd.h> int main(int argc, char **argv) { int world_size; int rank; char hostname[256]; char processor_name[MPI_MAX_PROCESSOR_NAME]; int name_len; int tag = 4; int

Red-Black Gauss Seidel and OpenMP

≡放荡痞女 提交于 2019-12-10 21:05:36
问题 I was trying to prove a point with OpenMP compared to MPICH, and I cooked up the following example to demonstrate how easy it was to do some high performance in OpenMP. The Gauss-Seidel iteration is split into two separate runs, such that in each sweep every operation can be performed in any order, and there should be no dependency between each task. So in theory each processor should never have to wait for another process to perform any kind of synchronization. The problem I am encountering,

Infiniband in Java

时光总嘲笑我的痴心妄想 提交于 2019-12-10 15:59:23
问题 As you all know, OFED's Socket Direct protocol is deprecated and OFED's 3.x releases do not come with SDP at all. Hence, Java's SDP also fails to work. I was wondering what is the proper method to program infiniband in Java? Is there any portable solution other than just writing JNI code? My requirement is achieve RDMA among collection of infiniband powered machines. 回答1: jVerbs might be what you're looking for. Here's a little bit of documentation. 回答2: jVerbs looks interesting otherwise you

Operating in parallel on a large constant datastructure in Julia

旧时模样 提交于 2019-12-10 15:53:28
问题 I have a large vector of vectors of strings: There are around 50,000 vectors of strings, each of which contains 2-15 strings of length 1-20 characters. MyScoringOperation is a function which operates on a vector of strings (the datum) and returns an array of 10100 scores (as Float64s). It takes about 0.01 seconds to run MyScoringOperation (depending on the length of the datum) function MyScoringOperation(state:State, datum::Vector{String}) ... score::Vector{Float64} #Size of score = 10000 I

“WindowsError: [Error 206] The filename or extension is too long” after running a program very many times with subprocess

感情迁移 提交于 2019-12-10 13:17:32
问题 My python program prepares inputs, runs an external FORTRAN code, and processes the outputs in a Windows HPC 2008 environment. It works great, unless the code executes the external program between 1042-1045 times (Usually the problem converges earlier). In these situations, I get an exception: WindowsError: [Error 206] The filename or extension is too long However, the path to the filename is not growing with time. It's just cleaning the directory and running again. Here's the code: inpF =