hpc | 易学教程

How to append a sparse domain in Chapel

阅读更多关于 How to append a sparse domain in Chapel

I'm populating a sparse array in Chapel with a loop that is reading over a CSV. I'm wondering what the best pattern is. var dnsDom = {1..n_dims, 1..n_dims}; var spsDom: sparse subdomain(dnsDom); for line in file_reader.lines() { var i = line[1]:int; var j = line[2]:int; spsDom += (i,j); } Is this an efficient way of doing it? Should I create a temporary array of tuples and append spsDom every ( say ) 10,000 rows? Thanks! The way you show in the snippet will expand the internal arrays of the sparse domain at every += operation. As you suggested; somehow buffering the read indices, then adding

Error building a C/C++ application with COMPSs: Hardcoded path

阅读更多关于 Error building a C/C++ application with COMPSs: Hardcoded path

I am trying to build a COMPSs application developed with the C/C++ binding. When I am building the application, I got the following error. Do you have an idea about how can I solve this issue? xxxx:~/xxx/c/increment> buildapp increment *---------------------------------------------------------------------* * * * BSC - Barcelona Supercomputing Center * * COMP Superscalar * * * * C/C++ Applications - BUILD SCRIPT * * * * * * More information at COMP Superscalar Website: http://compss.bsc.es * * * * Support: support-compss@bsc.es * * * * Dependencies: csh (sudo apt-get install csh) * * * *-------

STL containers speed vs. arrays

阅读更多关于 STL containers speed vs. arrays

I just started working on a scientific project where speed really matters (HPC). I'm currently designing the data structes. The core of the project is a 3D-Grid of double values, in order to solve a partial differenital equation. Since speed here is a probably bigger concern then simplicity of the code, I'd like to know how the STL performs compared to usual C-style arrays. In my case, since it's a 3D-grid, I was thinking of a) a one dimensional vector with linear indexing b) a vector of 3 vectors or c) a one dimensional c-style array or d) a three dimensional c-style array. I looked up older

Submit job with python code (mpi4py) on HPC cluster

阅读更多关于 Submit job with python code (mpi4py) on HPC cluster

I am working a python code with MPI (mpi4py) and I want to implement my code across many nodes (each node has 16 processors) in a queue in a HPC cluster. My code is structured as below: from mpi4py import MPI comm = MPI.COMM_WORLD size = comm.Get_size() rank = comm.Get_rank() count = 0 for i in range(1, size): if rank == i: for j in range(5): res = some_function(some_argument) comm.send(res, dest=0, tag=count) I am able to run this code perfectly fine on the head node of the cluster using the command $mpirun -np 48 python codename.py Here "code" is the name of the python script and in the

SLURM Submit multiple tasks per node?

阅读更多关于 SLURM Submit multiple tasks per node?

I found some very similar questions which helped me arrive at a script which seems to work however I'm still unsure if I fully understand why, hence this question.. My problem (example): On 3 nodes, I want to run 12 tasks on each node (so 36 tasks in total). Also each task uses OpenMP and should use 2 CPUs. In my case a node has 24 CPUs and 64GB memory. My script would be: #SBATCH --nodes=3 #SBATCH --ntasks=36 #SBATCH --cpus-per-task=2 #SBATCH --mem-per-cpu=2000 export OMP_NUM_THREADS=2 for i in {1..36}; do srun -N 1 -n 1 ./program input${i} >& out${i} & done wait This seems to work as I

Dearth of CUDA 5 Dynamic Parallelism Examples

阅读更多关于 Dearth of CUDA 5 Dynamic Parallelism Examples

I've been googling around and have only been able to find a trivial example of the new dynamic parallelism in Compute Capability 3.0 in one of their Tech Briefs linked from here . I'm aware that the HPC-specific cards probably won't be available until this time next year (after the nat'l labs get theirs). And yes, I realize that the simple example they gave is enough to get you going, but the more the merrier . Are there other examples I've missed? To save you the trouble, here is the entire example given in the tech brief: __global__ ChildKernel(void* data){ //Operate on data } __global__

Mvapich2 buffer aliasing

阅读更多关于 Mvapich2 buffer aliasing

I am launched an MPI program with MVAPICH2 and got this error: Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(923): MPI_Gather() failed PMPI_Gather(857): Buffers must not be aliased There are two ways I think I could solve this: Rewrite my MPI program (use different buffers) Disable checking buffer aliasing Do someone know how I could do this with MVAPICH2? Some compiler option, parameter, environmental variable, etc? Something like MV2_NO_BUFFER_ALIAS_CHECK, but it does not work. What you're doing is an incorrect program and you should rewrite your code to use

Can I emulate MS Compute Cluster Server on my dev machine?

阅读更多关于 Can I emulate MS Compute Cluster Server on my dev machine?

I have a project for a client that will consist of managing jobs on a MS Compute Cluster. I will be developing the application outside of their network, and would like a way to develop/debug my app without the need to be on their network. I am developing the app in C#, and all I have so far is the Microsoft Compute Cluster Pack SDK. Maybe this webcast can help you out Event link The webcast was helpful, in that it lead me to the MPI.Net API. MPI.Net will allow me to write an executable that can be launched via mpiexec.exe, and can manage the process of creating and monitoring parallel tasks.

Can you transpose array when sending using MPI_Type_create_subarray?

阅读更多关于 Can you transpose array when sending using MPI_Type_create_subarray?

I'm trying to transpose a matrix using MPI in C. Each process has a square submatrix, and I want to send that to the right process (the 'opposite' one on the grid), transposing it as part of the communication. I'm using MPI_Type_create_subarray which has an argument for the order, either MPI_ORDER_C or MPI_ORDER_FORTRAN for row-major and column-major respectively. I thought that if I sent as one of these, and received as the other, then my matrix would be transposed as part of the communication. However, this doesn't seem to happen - it just stays non-transposed. The important part of the code

SunGridEngine, Condor, Torque as Resource Managers for PVM

阅读更多关于 SunGridEngine, Condor, Torque as Resource Managers for PVM

Anyone have any idea which Resource manager is good for PVM? Or should I not have used PVM and instead relied on MPI (or any version of it, such as MPICH-2 [are there any other ones that are better?]). Main reason for using PVM was because the person before me who started this project assumed the use of PVM. However, now that this project is mine (he hasn't done any significant work that relies on PVM) this can be easily changed, preferably to something that is easy to install because installing and setting up PVM was a big hassle. I'm leaning towards SunGridEngine seeing as how I have