parallel-processing

Improving memory layout for parallel computing

断了今生、忘了曾经 提交于 2020-01-29 09:45:08
问题 I'm trying to optimize an algorithm (Lattice Boltzmann) for parallel computing using C++ AMP. And looking for some suggestions to optimize the memory layout, just found out that removing one parameter from the structure into another vector (the blocked vector) gave and increase of about 10%. Anyone got any tips that can further improve this, or something i should take into consideration? Below is the most time consuming function that is executed for each timestep, and the structure used for

Python: Running nested loop, 2D moving window, in Parallel

 ̄綄美尐妖づ 提交于 2020-01-29 09:15:47
问题 I work with topographic data. For one particular problem, I have written a function in Python which uses a moving window of a particular size to zip through a matrix (grid of elevations). Then I have to perform an analysis on this window and set the cell at the center of the window a resulting value. My final output is a matrix the same size as my original matrix which has been altered according to my analysis. This problem takes 11 hours to run on a small area, so I thought parallelizing the

Jenkins - java.lang.IllegalArgumentException: Last unit does not have enough valid bits & Gradle error: Task 'null' not found in root project

无人久伴 提交于 2020-01-27 08:45:56
问题 Jenkins 2.176.4-3 rolling Gradle 4.3.1 Issue area : Parallel run of a given single Gradle task (or it could be any simple action) and especially when running concurrent runs of Jenkinsfile based pipelines All the sudden I got this error in Jenkins log page, never seen this error before (found no stackoverflow posts either for this error in Jenkins). Error: java.lang.IllegalArgumentException: Last unit does not have enough valid bits For some reason the previous build failed and automatically

Why openMP does not support reduction for arrays in C?

久未见 提交于 2020-01-25 18:10:08
问题 In OpenMP 3.0 in Fortran reduction is supported with the special construct, while in C/C++ it is delegated to a programmer. I was wondering if there is a special reason for that, because OpenMP 3.0 came out in 2008, so I thought it was enough time to implement it for C/C++ also. Is there any particular technical reason associated with C/C++, why it is still not supported for C/C++? 回答1: As was mentioned in the comments the reason for OpenMP not supporting reduction by default for arrays is

Delphi - OmniThreadLibrary Parallel.ForEach with Records

非 Y 不嫁゛ 提交于 2020-01-25 14:32:32
问题 I am running Delphi XE2 and trying to get familiar with the OmniThreadLibrary, I have 3.03b installed. I have been looking at the Parallel.ForEach examples and am not sure of what's going on in the background (this may well be obvious later - sorry). Any information you can offer to help me better understand how to achieve my goal will be much appreciated. Suppose I have some record that is just a container for 2 related values, a and b. I then want to run a parallel loop that returns an

Delphi - OmniThreadLibrary Parallel.ForEach with Records

回眸只為那壹抹淺笑 提交于 2020-01-25 14:32:15
问题 I am running Delphi XE2 and trying to get familiar with the OmniThreadLibrary, I have 3.03b installed. I have been looking at the Parallel.ForEach examples and am not sure of what's going on in the background (this may well be obvious later - sorry). Any information you can offer to help me better understand how to achieve my goal will be much appreciated. Suppose I have some record that is just a container for 2 related values, a and b. I then want to run a parallel loop that returns an

Speed-up nested cross-validation

大城市里の小女人 提交于 2020-01-25 07:19:28
问题 In order to speed-up nested cross-validation with sklearn, is it better to fix n_jobs=-1 in inner or outer loop, since nested parallelism is not allowed ? 回答1: This seems to be an an open question, see e.g. this open issue on scikit-learn's github page. Another approach is to use a Message Passing Interface (MPI) to exploit multiple processors, see e.g. this blogpost using MPI4PY. 来源: https://stackoverflow.com/questions/49629112/speed-up-nested-cross-validation

Replace Task.WhenAll with PLinq

混江龙づ霸主 提交于 2020-01-25 06:39:08
问题 I'm having a method which calls a WCF service multiple times in parallel. To prevent an overload on the target system, I want to use PLinq's ability to limit the number of parallel executions. Now I wonder how I could rewrite my method in an efficient way. Here's my current implementation: private async Task RunFullImport(IProgress<float> progress) { var dataEntryCache = new ConcurrentHashSet<int>(); using var client = new ESBClient(); // WCF // Progress counters helpers float totalSteps = 1f

Use cpu function in cuda

我的梦境 提交于 2020-01-25 06:12:25
问题 I would like to include a C++ function in a CUDA Kernel, but this function is written for CPU like this: inline float random(int rangeMin,int rangeMax){ return rand(rangeMin,rangeMax); } Assume that the rand() function use either curand.h or Thrust cuda library. I thought in use a Kernel function (with only one GPU thread) that would include this function as inline, so the cuda compiler would generate the binary for the GPU. Is this possible? If so I would like to include another inlines

Parallel.For System.OutOfMemoryException

半腔热情 提交于 2020-01-25 04:30:14
问题 We have a fairly simple program that's used for creating backups. I'm attempting to parallelize it but am getting an OutOfMemoryException within an AggregateException. Some of the source folders are quite large, and the program doesn't crash for about 40 minutes after it starts. I don't know where to start looking so the below code is a near exact dump of all code the code sans directory structure and Exception logging code. Any advice as to where to start looking? using System; using System