hpc

What is the analogue of an NDIS filter in linux?

与世无争的帅哥 提交于 2019-12-22 20:30:19
问题 I am working on an as close to real-time system as possible in linux and need to send about 600-800 bytes in a TCP packet as soon as I receive a specific packet. For best possible latencies I want this packet to be sent directly from the kernel instead of it the received packet going all the way up to the userspace and the applicaiton and then making its way back. If I were on windows I'd have written an NDIS filter which I would cache the packet to be sent with and the matching parameters so

STL containers speed vs. arrays

北城以北 提交于 2019-12-22 09:36:21
问题 I just started working on a scientific project where speed really matters (HPC). I'm currently designing the data structes. The core of the project is a 3D-Grid of double values, in order to solve a partial differenital equation. Since speed here is a probably bigger concern then simplicity of the code, I'd like to know how the STL performs compared to usual C-style arrays. In my case, since it's a 3D-grid, I was thinking of a) a one dimensional vector with linear indexing b) a vector of 3

OpenMP and C++11 multithreading

别说谁变了你拦得住时间么 提交于 2019-12-22 08:20:59
问题 I am currently working on a project that mixes high-performance computing (HPC) and interactivity. As such, the HPC part relies on OpenMP (mainly for-loops with lots of identical computations) but it is included in a larger framework with a GUI and multithreading, currently achieved with c++11 threads ( std::thread and std::async ). I have read Does OpenMP play nice with C++ promises and futures? and Why do c++11 threads become unjoinable when using nested OpenMP pragmas? that it is no good

SLURM Submit multiple tasks per node?

房东的猫 提交于 2019-12-22 06:16:15
问题 I found some very similar questions which helped me arrive at a script which seems to work however I'm still unsure if I fully understand why, hence this question.. My problem (example): On 3 nodes, I want to run 12 tasks on each node (so 36 tasks in total). Also each task uses OpenMP and should use 2 CPUs. In my case a node has 24 CPUs and 64GB memory. My script would be: #SBATCH --nodes=3 #SBATCH --ntasks=36 #SBATCH --cpus-per-task=2 #SBATCH --mem-per-cpu=2000 export OMP_NUM_THREADS=2 for i

openmp - while loop for text file reading and using a pipeline

为君一笑 提交于 2019-12-18 03:04:27
问题 I discovered that openmp doesn't support while loops( or at least doesn't like them too much). And also doesn't like the ' != ' operator. I have this bit of code. int count = 1; #pragma omp parallel for while ( fgets(buff, BUFF_SIZE, f) != NULL ) { len = strlen(buff); int sequence_counter = segment_read(buff,len,count); if (sequence_counter == 1) { count_of_reads++; printf("\n Total No. of reads: %d \n",count_of_reads); } count++; } Any clues as to how to manage this ? I read somewhere (

Netlogo HPC CPU Percentage Use Increase

旧街凉风 提交于 2019-12-13 14:06:05
问题 I submit jobs using headless NetLogo to a HPC server by the following code : #!/bin/bash #$ -N r20p #$ -q all.q #$ -pe mpi 24 /home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \ --model /home/abhishekb/models/corrected-rk4-20presults.nlogo \ --experiment test \ --table /home/abhishekb/csvresults/corrected-rk4-20presults.csv Below is the snapshot of a cluster queue using: qstat -g c I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't

How to successfully compile mpi4py using MS HPC Server 2008 R2's MPI stack?

戏子无情 提交于 2019-12-13 13:13:59
问题 So the story goes: I need a MPI wrapper for Python. I know there's mpi4py. For the current work I (mostly) use Python and Windows, I'd like to use the Microsoft HPC Cluster Pack, having access to a few pretty "strong" machines running Win 2008 Server. Just to mention, besides Win-experience, I do have a bit of *nix experience with MPI and stuff, but that's pretty moot point for this problem. My interest in mpi4py was renewed when I ran into Python Tools for Visual Studio. That's some

Running NetLogo headless on HPC, how to increase CPU usage?

别说谁变了你拦得住时间么 提交于 2019-12-13 09:52:07
问题 I was running NetLogo headless on HPC using behaviourspace. Some non-NetLogo other user on the HPC complained to me that I am not utilizing the CPU cores to very little extent and should increase. I don't know exactly know to how to do so, please help. I am guessing renice won't be of any help. Code: #!/bin/bash #$ -N NewPara3-d #$ -q all.q #$ -pe mpi 30 /home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \ --model /home/abhishekb/models/Test_results3-d.nlogo \ --experiment 3-d \ -

How to write/read a single float value(buffer) from OpenCL device

此生再无相见时 提交于 2019-12-13 03:49:33
问题 There are lots of questions about how to read an array from the device, but I only wanna read a single float value from the device. Or it only can read an array from the device? I create a buffer for (float) sum like below. ocl.sum = clCreateBuffer(context, CL_MEM_READ_WRITE, 1, NULL, &err); Set the arg like this. clSetKernelArg(kernel, 0, sizeof(cl_men), &ocl.arr); clSetKernelArg(kernel, 1, sizeof(cl_float), &ocl.sum); In the kernel, I calculate the sum. kernel calculate(global arr, float

Writing distributed arrays using MPI-IO and Cartesian topology

纵饮孤独 提交于 2019-12-13 02:36:50
问题 I have an MPI code that implements 2D domain decomposition to compute numerical solutions to a PDE. Currently I write certain 2D distributed arrays out for each process (e.g. array_x--> proc000x.bin). I want to reduce that to a single binary file. array_0, array_1, array_2, array_3, Suppose the above illustrates a cartesian topology with 4 processes (2x2). Each 2D array has dimension (nx + 2, nz + 2). The +2 signifies "ghost" layers added to all sides for communication purposes. I would like