I\'m working with an 8 core processor, and am using Boost threads to run a large program. Logically, the program can be split into groups, where each group is run by a threa
The best what you can try to reach ~8 memory allocation in parallel (since you have 8 physical cores), not 10000 as you wrote
standard malloc uses mutex and standard STL allocator does the same. Therefore it will not speed up automatically when you introduce threading. Nevertheless, you can use another malloc library (google for e.g. "ptmalloc") which does not use global locking. if you allocate using STL (e.g. allocate strings, vectors) you have to write your own allocator.
Rather interesting article: http://developers.sun.com/solaris/articles/multiproc/multiproc.html