hpc | 易学教程

Mvapich2 buffer aliasing

阅读更多关于 Mvapich2 buffer aliasing

问题 I am launched an MPI program with MVAPICH2 and got this error: Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(923): MPI_Gather() failed PMPI_Gather(857): Buffers must not be aliased There are two ways I think I could solve this: Rewrite my MPI program (use different buffers) Disable checking buffer aliasing Do someone know how I could do this with MVAPICH2? Some compiler option, parameter, environmental variable, etc? Something like MV2_NO_BUFFER_ALIAS_CHECK, but

Can I emulate MS Compute Cluster Server on my dev machine?

阅读更多关于 Can I emulate MS Compute Cluster Server on my dev machine?

问题 I have a project for a client that will consist of managing jobs on a MS Compute Cluster. I will be developing the application outside of their network, and would like a way to develop/debug my app without the need to be on their network. I am developing the app in C#, and all I have so far is the Microsoft Compute Cluster Pack SDK. 回答1: Maybe this webcast can help you out Event link 回答2: The webcast was helpful, in that it lead me to the MPI.Net API. MPI.Net will allow me to write an

Dearth of CUDA 5 Dynamic Parallelism Examples

阅读更多关于 Dearth of CUDA 5 Dynamic Parallelism Examples

问题 I've been googling around and have only been able to find a trivial example of the new dynamic parallelism in Compute Capability 3.0 in one of their Tech Briefs linked from here. I'm aware that the HPC-specific cards probably won't be available until this time next year (after the nat'l labs get theirs). And yes, I realize that the simple example they gave is enough to get you going, but the more the merrier . Are there other examples I've missed? To save you the trouble, here is the entire

Can you transpose array when sending using MPI_Type_create_subarray?

阅读更多关于 Can you transpose array when sending using MPI_Type_create_subarray?

问题 I'm trying to transpose a matrix using MPI in C. Each process has a square submatrix, and I want to send that to the right process (the 'opposite' one on the grid), transposing it as part of the communication. I'm using MPI_Type_create_subarray which has an argument for the order, either MPI_ORDER_C or MPI_ORDER_FORTRAN for row-major and column-major respectively. I thought that if I sent as one of these, and received as the other, then my matrix would be transposed as part of the

SunGridEngine, Condor, Torque as Resource Managers for PVM

阅读更多关于 SunGridEngine, Condor, Torque as Resource Managers for PVM

问题 Anyone have any idea which Resource manager is good for PVM? Or should I not have used PVM and instead relied on MPI (or any version of it, such as MPICH-2 [are there any other ones that are better?]). Main reason for using PVM was because the person before me who started this project assumed the use of PVM. However, now that this project is mine (he hasn't done any significant work that relies on PVM) this can be easily changed, preferably to something that is easy to install because

MPI_Isend and MPI_Irecv seem to be causing a deadlock

阅读更多关于 MPI_Isend and MPI_Irecv seem to be causing a deadlock

I'm using non-blocking communication in MPI to send various messages between processes. However, I appear to be getting a deadlock. I have used PADB ( see here ) to look at the message queues and have got the following output: 1:msg12: Operation 1 (pending_receive) status 0 (pending) 1:msg12: Rank local 4 global 4 1:msg12: Size desired 4 1:msg12: tag_wild 0 1:msg12: Tag desired 16 1:msg12: system_buffer 0 1:msg12: Buffer 0xcaad32c 1:msg12: 'Receive: 0xcac3c80' 1:msg12: 'Data: 4 * MPI_FLOAT' -- 1:msg32: Operation 0 (pending_send) status 2 (complete) 1:msg32: Rank local 4 global 4 1:msg32:

Ensure hybrid MPI / OpenMP runs each OpenMP thread on a different core

阅读更多关于 Ensure hybrid MPI / OpenMP runs each OpenMP thread on a different core

I am trying to get a hybrid OpenMP / MPI job to run so that OpenMP threads are separated by core (only one thread per core). I have seen other answers which use numa-ctl and bash scripts to set environment variables, and I don't want to do this. I would like to be able to do this only by setting OMP_NUM_THREADS and or OMP_PROC_BIND and mpiexec options on the command line. I have tried the following - let's say I want 2 MPI processes that each have 2 OpenMP threads, and each of the threads are run on separate cores, so I want 4 cores total. OMP_PROC_BIND=true OMP_PLACES=cores OMP_NUM_THREADS=2

How to use multiple nodes/cores on a cluster with parellelized Python code

阅读更多关于 How to use multiple nodes/cores on a cluster with parellelized Python code

问题 I have a piece of Python code where I use joblib and multiprocessing to make parts of the code run in parallel. I have no trouble running this on my desktop where I can use Task Manager to see that it uses all four cores and runs the code in parallel. I recently learnt that I have access to a HPC cluster with 100+ 20 core nodes. The cluster uses SLURM as the workload manager. The first question is: Is it possible to run parallelized Python code on a cluster? If it is possible, Does the Python

Unable to use all cores with mpirun

阅读更多关于 Unable to use all cores with mpirun

问题 I'm testing a simple MPI program on my desktop (Ubuntu LTS 16.04/ Intel® Core™ i3-6100U CPU @ 2.30GHz × 4/ gcc 4.8.5 /OpenMPI 3.0.0) and mpirun won't let me use all of the cores on my machine (4). When I run: $ mpirun -n 4 ./test2 I get the following error: -------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 4 slots that were requested by the application: ./test2 Either request fewer slots for your

MPI + GPU : how to mix the two techniques

阅读更多关于 MPI + GPU : how to mix the two techniques

问题 My program is well-suited for MPI. Each CPU does its own, specific (sophisticated) job, produces a single double , and then I use an MPI_Reduce to multiply the result from every CPU. But I repeat this many, many times (> 100,000). Thus, it occurred to me that a GPU would dramatically speed things up. I have google'd around, but can't find anything concrete. How do you go about mixing MPI with GPUs? Is there a way for the program to query and verify "oh, this rank is the GPU, all other are