I\'ve been running some experiments with the openmp framework and found some odd results I\'m not sure I know how to explain.
My goal is to create this huge matrix a
Technically, the STL vector uses the std::allocator which eventually calls new. new in its turn calls the libc's malloc (for your Linux system).
This malloc implementation is quite efficient as a general purpose allocator, is thread-safe, however it is not scalable (the GNU libc's malloc derives from Doug Lea's dlmalloc). There are numerous allocators and papers that improve upon dlmalloc to provide scalable allocation.
I would suggest that you take a look at Hoard from Dr. Emery Berger, tcmalloc from Google and Intel Threading Building Blocks scalable allocator.