hpc | 易学教程

What is the analogue of an NDIS filter in linux?

阅读更多关于 What is the analogue of an NDIS filter in linux?

问题 I am working on an as close to real-time system as possible in linux and need to send about 600-800 bytes in a TCP packet as soon as I receive a specific packet. For best possible latencies I want this packet to be sent directly from the kernel instead of it the received packet going all the way up to the userspace and the applicaiton and then making its way back. If I were on windows I'd have written an NDIS filter which I would cache the packet to be sent with and the matching parameters so

STL containers speed vs. arrays

阅读更多关于 STL containers speed vs. arrays

问题 I just started working on a scientific project where speed really matters (HPC). I'm currently designing the data structes. The core of the project is a 3D-Grid of double values, in order to solve a partial differenital equation. Since speed here is a probably bigger concern then simplicity of the code, I'd like to know how the STL performs compared to usual C-style arrays. In my case, since it's a 3D-grid, I was thinking of a) a one dimensional vector with linear indexing b) a vector of 3

OpenMP and C++11 multithreading

阅读更多关于 OpenMP and C++11 multithreading

问题 I am currently working on a project that mixes high-performance computing (HPC) and interactivity. As such, the HPC part relies on OpenMP (mainly for-loops with lots of identical computations) but it is included in a larger framework with a GUI and multithreading, currently achieved with c++11 threads ( std::thread and std::async ). I have read Does OpenMP play nice with C++ promises and futures? and Why do c++11 threads become unjoinable when using nested OpenMP pragmas? that it is no good

SLURM Submit multiple tasks per node?

阅读更多关于 SLURM Submit multiple tasks per node?

问题 I found some very similar questions which helped me arrive at a script which seems to work however I'm still unsure if I fully understand why, hence this question.. My problem (example): On 3 nodes, I want to run 12 tasks on each node (so 36 tasks in total). Also each task uses OpenMP and should use 2 CPUs. In my case a node has 24 CPUs and 64GB memory. My script would be: #SBATCH --nodes=3 #SBATCH --ntasks=36 #SBATCH --cpus-per-task=2 #SBATCH --mem-per-cpu=2000 export OMP_NUM_THREADS=2 for i

openmp - while loop for text file reading and using a pipeline

阅读更多关于 openmp - while loop for text file reading and using a pipeline

问题 I discovered that openmp doesn't support while loops( or at least doesn't like them too much). And also doesn't like the ' != ' operator. I have this bit of code. int count = 1; #pragma omp parallel for while ( fgets(buff, BUFF_SIZE, f) != NULL ) { len = strlen(buff); int sequence_counter = segment_read(buff,len,count); if (sequence_counter == 1) { count_of_reads++; printf("\n Total No. of reads: %d \n",count_of_reads); } count++; } Any clues as to how to manage this ? I read somewhere (

Netlogo HPC CPU Percentage Use Increase

阅读更多关于 Netlogo HPC CPU Percentage Use Increase

问题 I submit jobs using headless NetLogo to a HPC server by the following code : #!/bin/bash #$ -N r20p #$ -q all.q #$ -pe mpi 24 /home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \ --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \ --experiment test \ --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv Below is the snapshot of a cluster queue using: qstat -g c I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't

How to successfully compile mpi4py using MS HPC Server 2008 R2's MPI stack?

阅读更多关于 How to successfully compile mpi4py using MS HPC Server 2008 R2's MPI stack?

问题 So the story goes: I need a MPI wrapper for Python. I know there's mpi4py. For the current work I (mostly) use Python and Windows, I'd like to use the Microsoft HPC Cluster Pack, having access to a few pretty "strong" machines running Win 2008 Server. Just to mention, besides Win-experience, I do have a bit of *nix experience with MPI and stuff, but that's pretty moot point for this problem. My interest in mpi4py was renewed when I ran into Python Tools for Visual Studio. That's some

Running NetLogo headless on HPC, how to increase CPU usage?

阅读更多关于 Running NetLogo headless on HPC, how to increase CPU usage?

问题 I was running NetLogo headless on HPC using behaviourspace. Some non-NetLogo other user on the HPC complained to me that I am not utilizing the CPU cores to very little extent and should increase. I don't know exactly know to how to do so, please help. I am guessing renice won't be of any help. Code: #!/bin/bash #$ -N NewPara3-d #$ -q all.q #$ -pe mpi 30 /home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \ --model /home/abhishekb/models/Test_results3-d.nlogo \ --experiment 3-d \ -

How to write/read a single float value(buffer) from OpenCL device

阅读更多关于 How to write/read a single float value(buffer) from OpenCL device

问题 There are lots of questions about how to read an array from the device, but I only wanna read a single float value from the device. Or it only can read an array from the device? I create a buffer for (float) sum like below. ocl.sum = clCreateBuffer(context, CL_MEM_READ_WRITE, 1, NULL, &err); Set the arg like this. clSetKernelArg(kernel, 0, sizeof(cl_men), &ocl.arr); clSetKernelArg(kernel, 1, sizeof(cl_float), &ocl.sum); In the kernel, I calculate the sum. kernel calculate(global arr, float

Writing distributed arrays using MPI-IO and Cartesian topology

阅读更多关于 Writing distributed arrays using MPI-IO and Cartesian topology

问题 I have an MPI code that implements 2D domain decomposition to compute numerical solutions to a PDE. Currently I write certain 2D distributed arrays out for each process (e.g. array_x--> proc000x.bin). I want to reduce that to a single binary file. array_0, array_1, array_2, array_3, Suppose the above illustrates a cartesian topology with 4 processes (2x2). Each 2D array has dimension (nx + 2, nz + 2). The +2 signifies "ghost" layers added to all sides for communication purposes. I would like