问题
I've been reading the book C++ concurrency in action, here is the example in the book using futures to implement parallel quick sort.
But I found this function is more than twice slower than the single threaded quick sort function without using any asynchronous facilities in c++ standard library. Tested with g++ 4.8 and visual c++ 2012.
I used 10M random integers to test, and in visual c++ 2012,this function spawned 6 threads in total to perform the operation in my quad core PC.
I am really confused about the performance. Any body can tell me why?
template<typename T>
std::list<T> parallel_quick_sort(std::list<T> input)
{
if(input.empty())
{
return input;
}
std::list<T> result;
result.splice(result.begin(),input,input.begin());
T const& pivot=*result.begin();
auto divide_point=std::partition(input.begin(),input.end(),
[&](T const& t){return t<pivot;});
std::list<T> lower_part;
lower_part.splice(lower_part.end(),input,input.begin(),
divide_point);
std::future<std::list<T> > new_lower(
std::async(¶llel_quick_sort<T>,std::move(lower_part)));
auto new_higher(
parallel_quick_sort(std::move(input)));
result.splice(result.end(),new_higher);
result.splice(result.begin(),new_lower.get());
return result;
}
回答1:
The code is just horribly sub-optimal. For example, why not std::list<T> result(input)
? Why not parallel_quick_sort(const std::list<T>& input
? Profile it and I bet you'll find all kinds of horrible things. Before you make any sense of code's performance, you have to make sure it's spending its time doing what you think it's doing!
来源:https://stackoverflow.com/questions/16248321/parallel-quick-sort-outdone-by-single-threaded-quicksort