Thread IDs with PPL and Parallel Memory Allocation

落爺英雄遲暮 提交于 2019-12-07 09:30:20

问题


I have a question about the Microsoft PPL library, and parallel programming in general. I am using FFTW to perform a large set (100,000) of 64 x 64 x 64 FFTs and inverse FFTs. In my current implementation, I use a parallel for loop and allocate the storage arrays within the loop. I have noticed that my CPU usage only tops out at about 60-70% in these cases. (Note this is still better utilization than the built in threaded FFTs provided by FFTW which I have tested). Since I am using fftw_malloc, is it possible that excessive locking is occurring which is preventing full usage?

In light of this, is it advisable to preallocate the storage arrays for each thread before the main processing loop, so no locks are required within the loop itself? And if so, how is this possible with the MSFT PPL library? I have been using OpenMP before, in that case it is simple enough to get a thread ID using supplied functions. I have not however seen a similar function in the PPL documentation.


回答1:


I am just answering this because nobody has posted anything yet.

Mutex(e)s can wreak havoc on performance if heavy locking is required. In addition if a lot of memory (re)-allocation is needed, that can also decrease performance and limit it to your memory bandwidth. Like you said a preallocation which later threads operate on can be usefull. However this requires that you have a fixed threadcount and that you spread your workload balanced on all threads.

Concerning the PPL thread_id functions, I can only speak about Intel-TBB, which however should be pretty similiar to PPL. TBB - and I suppose also PPL - is not speaking of threads directly, instead they are talking about tasks, the aim of TBB was to abstract these underlaying details away from the user, thus it does not provide a thread_id function.




回答2:


Using PPL I have had good performance with an application that does a lot of allocations by using a Concurrency::combinable to hold a structure containing memory allocated per thread.

In fact you don't have to pre-allocate you can check the value of your combinable variable with ->local() and allocate it if it is null. Next time this thread is called it will already be allocated.

Of course you have to free the memory when all task are done which can be done using: with something like:

combine_each([](MyPtr* p){ delete p; });


来源:https://stackoverflow.com/questions/9990363/thread-ids-with-ppl-and-parallel-memory-allocation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!