问题
I'm just starting to learn about parallel programming, and I'm looking at binary search.
This can't really be optimized by throwing more processors at it right? I know it's supposedly dividing and conquering, but you're really "decreasing and conquering" (from Wikipedia).
Or could you possibly parallelize the comparisons? (if X
is less than array[mid]
, search from low
to mid - 1
; else if X
is greater than array[mid]
search from mid + 1
to high
, else return mid
, the index of X
)
Or how about you give half of the array to one processor to do binary search on, and the other half to another? Wouldn't that be wasteful though? Because it's decreasing and conquering rather than simply dividing and conquering? Thoughts?
回答1:
You can easily use parallelism.
For k
is less than n
processors, split the array into n/k
groups and assign a processor to each group.
Run binary search on that group.
Now the time is log(n/k).
There is also a crew method that is logn/log(k+1).
回答2:
I would think it certainly qualifies for parallelisation. At least, across two threads. Have one thread do a depth-first search, and the other do a breadth-first search. The winner is the algorithm that performs the fastest, which may be different from data-set to data-set.
回答3:
I don't have much experience in parallel programming, but I doubt this is a good candidate for parallel processing. Each step of the algorithm depends on performing one comparison, and then proceeding down a set "path" based on this comparison (you either found your value, or now have to keep searching in a set "direction" based on the comparison). Two separate threads performing the same comparison won't get you anywhere any faster, and separate threads will both need to rely on the same comparison to decide what to do next, so they can't really do any useful, divided work on their own.
As far as your idea of splitting the array, I think you are just negating the benefit of binary search in this case. Your value (assuming it's in your array), will either be in the top or the bottom half of your array. The first comparison (at the midpoint) in a binary search is going to tell you which half you should be looking in. If you take that even further, consider breaking an array of N elements into N different binary searches (a naive attempt to parallel-ize). You are now doing N comparisons, when you don't need to. You are losing the power of binary search, in that each comparison will narrow down your search to the appropriate subset.
Hope that helps. Comments welcome.
回答4:
Yes, in the classical sense of parallelization (multi-core), binary search and BST are not much better.
There are techniques like having multiple copies of the BST on L1 cache for each processor. Only one processor is active but the gains from having multiple L1 caches can be great (4 cycles for L1 vs 14 cycles for L2).
In real world problems you are often searching multiple keys at the same time.
Now, there is another kind of parallelization that can help: SIMD! Check out "Fast architecture sensitive tree search on modern CPUs and GPUs" by a team from Intel/UCSC/Oracle (SIGMOD 2010). It's very cool. BTW I'm basing my current research project on this very paper.
回答5:
Parallel implementation can speed up a binary search, but the improvement is not particularly significant. Worst case, the time required for a binary search is log_2(n)
where n
is the number of elements in the list. A simple parallel implementation breaks the master list into k
sub-lists to be bin-searched by parallel threads. The resulting worst-case time for the binary search is log_2(n/k)
realizing a theoretical decrease in the search time.
Example:
A list of 1024
entries takes as many as 10
cycles to binary search using a single thread. Using 4
threads, each thread only would only take 8
cycles to complete the search. And using 8
threads, each thread takes 7
cycles. Thus, an 8
threaded parallel binary search could be up to 30%
faster than the single threaded model.
However, his speed-up should not be confused with a improvement in efficiency: The 8
threaded model actually executes 8 * 7 = 56
comparisons to complete the search compared to the 10
comparisons executed by the single threaded binary search. It is up to the discretion of the programmer if the marginal gain in speed of a parallel application of binary search is appropriate or advantageous for their application.
回答6:
I am pretty sure binary search can be speed up by a factor of log (M) where M is the number of processors. log(n/M) = log(n) - log(M) > log(n)/ log(M) for a constant M. I do not have a proof for a tight lower bound, but if M=n, the execution time is O(1), which cannot be any better. An algorithm sketch follows.
Parallel_Binary_Search(sorted_arraylist)
- Divide your sorted_arraylist into M chunks of size n/M.
- Apply one step of comparison to the middle element of each chunk.
- If a comparator signals equality, return the address and terminate.
- Otherwise, identify both adjacent chunks where comparators signaled (>) and (<), respectively.
- Form a new Chunk starting from the element following the one that signaled (>) and ending at the element preceding the one that signaled (<).
- If they are the same element, return fail and terminate.
- Otherwise, Parallel_Binary_Search(Chunk)
来源:https://stackoverflow.com/questions/8423873/parallel-binary-search