Can multithreading speed up memory allocation?

前端 未结 10 2333
感情败类
感情败类 2020-12-05 03:29

I\'m working with an 8 core processor, and am using Boost threads to run a large program. Logically, the program can be split into groups, where each group is run by a threa

相关标签:
10条回答
  • 2020-12-05 03:53

    Standard CRT

    While with older of Visual Studio the default CRT allocator was blocking, this is no longer true at least for Visual Studio 2010 and newer, which calls corresponding OS functions directly. The Windows heap manager was blocking until Widows XP, in XP the optional Low Fragmentation Heap is not blocking, while the default one is, and newer OSes (Vista/Win7) use LFH by default. The performance of recent (Windows 7) allocators is very good, comparable to scalable replacements listed below (you still might prefer them if targeting older platforms or when you need some other features they provide). There exist several multiple "scalable allocators", with different licenses and different drawbacks. I think on Linux the default runtime library already uses a scalable allocator (some variant of PTMalloc).

    Scalable replacements

    I know about:

    • HOARD (GNU + commercial licenses)
    • MicroQuill SmartHeap for SMP (commercial license)
    • Google Perf Tools TCMalloc (BSD license)
    • NedMalloc (BSD license)
    • JemAlloc (BSD license)
    • PTMalloc (GNU, no Windows port yet?)
    • Intel Thread Building Blocks (GNU, commercial)

    You might want to check Scalable memory allocator experiences for my experiences with trying to use some of them in a Windows project.

    In practice most of them work by having a per thread cache and per thread pre-allocated regions for allocations, which means that small allocations most often happen inside of a context of thread only, OS services are called only infrequently.

    0 讨论(0)
  • 2020-12-05 03:54

    You may want to take a look at The Hoard Memory Allocator: "is a drop-in replacement for malloc() that can dramatically improve application performance, especially for multithreaded programs running on multiprocessors."

    0 讨论(0)
  • 2020-12-05 03:57

    There are 2 scalable drop-in replacements for malloc that I know of:

    • Google's tcmalloc
    • Facebook's jemalloc (link to a performance study comparing to tcmalloc)

    I don't have any experience with Hoard (which performed poorly in the study), but Emery Berger lurks on this site and was astonished by the results. He said he would have a look and I surmise there might have been some specifics to either the test or implementation that "trapped" Hoard as the general feedback is usually good.

    One word of caution with jemalloc, it can waste a bit of space when you rapidly create then discard threads (as it creates a new pool for each thread you allocate from). If your threads are stable, there should not be any issue with this.

    0 讨论(0)
  • 2020-12-05 03:58

    Dynamic allocation of memory uses the heap of the application/module/process (but not thread). The heap can only handle one allocation request at a time. If you try to allocate memory in "parallel" threads, they will be handled in due order by the heap. You will not get a behaviour like: one thread is waiting to get its memory while another can ask for some, while a third one is getting some. The threads will have to line-up in queue to get their chunk of memory.

    What you would need is a pool of heaps. Use whichever heap is not busy at the moment to allocate the memory. But then, you have to watch out throughout the life of this variable such that it does not get de-allocated on another heap (that would cause a crash).

    I know that Win32 API has functions such as GetProcessHeap(), CreateHeap(), HeapAlloc() and HeapFree(), that allow you to create a new heap and allocate/deallocate memory from a specific heap HANDLE. I don't know of an equivalence in other operating systems (I have looked for them, but to no avail).

    You should, of course, try to avoid doing frequent dynamic allocations. But if you can't, you might consider (for portability) to create your own "heap" class (doesn't have to be a heap per se, just a very efficient allocator) that can manage a large chunk of memory and surely a smart pointer class that would hold a reference to the heap from which it came. This would enable you to use multiple heaps (make sure they are thread-safe).

    0 讨论(0)
  • 2020-12-05 04:02

    You need to check your compiler documentation whether it makes the allocator thread safe or not. If it does not, then you will need to overload your new operator and make it thread safe. Else it will result in either a segfault or UB.

    0 讨论(0)
  • 2020-12-05 04:03
    1. The best what you can try to reach ~8 memory allocation in parallel (since you have 8 physical cores), not 10000 as you wrote

    2. standard malloc uses mutex and standard STL allocator does the same. Therefore it will not speed up automatically when you introduce threading. Nevertheless, you can use another malloc library (google for e.g. "ptmalloc") which does not use global locking. if you allocate using STL (e.g. allocate strings, vectors) you have to write your own allocator.

    Rather interesting article: http://developers.sun.com/solaris/articles/multiproc/multiproc.html

    0 讨论(0)
提交回复
热议问题