I\'ve been running some experiments with the openmp framework and found some odd results I\'m not sure I know how to explain.
My goal is to create this huge matrix a
Memory allocators are definitely a possible contention point for multiple threads.
Fundamentally, the heap is a shared data structure, since it is possible to allocate memory on one thread, and de-allocate it on another. In fact, your example does exactly that - the "resize" will free memory on each of the worker threads, which was initially allocated elsewhere.
Typical implementations of malloc included with gcc and other compilers use a shared global lock and work reasonably well across threads if memory allocation pressure is relatively low. Above a certain allocation level, however, threads will begin to serialize on the lock, you'll get excessive context switching and cache trashing, and performance will degrade. Your program is an example of something which is allocation heavy, with an alloc + dealloc in the inner loop.
I'm surprised that an OpenMP compatible compiler doesn't have a better threaded malloc implementation? They certainly exist - take a look at this question for a list.