http://pastebin.com/YMS4ehRj
^ This is my implementation of parallel merge sort. Basically what I do is, For every split, the first half is handled by a thread where
You are creating a large number of threads, each of which then only does very little work. To sort 25000 ints you create about 12500 threads that spawn other threads and merge their results, and about 12500 threads that only sort two ints each.
The overhead from creating all those threads far outweighs the gains you get from parallel processing.
To avoid this, make sure that each thread has a reasonable amount of work to do. For example, if one thread finds that it only has to sort <10000 numbers it can simply sort them itself with a normal merge sort, instead of spawning new threads.