问题
I have a problem and I want to make sure if I am doing it most efficiently. I have an array A of float values of size N. The values are all between 0 and 1.
I have to find top k values which can be a product of a maximum of three numbers from A. So, the top-k list can have individual numbers from A, product of two numbers or product of three numbers from A.
So, this is how I am doing it now. I can get top-k numbers in desecding order in O(Nlogk) time. I then create a max-heap and initialize it with best values of maximum size 3 i.e. if I represent the sorted array(descending) of k values as B and the numbers by its index in that array, I insert numbers which are at index (0), (0,1) and (0,1,2). Next, I perform extract on heap and whenever I extract a size z (product of z numbers) value, I replace it with the set of next possible size z numbers i.e. if suppose (2,4) is extracted, I can replace it with (3,4) and (2,5). And do extract k times to get results.
Need better ideas if you have. Thanks all.
回答1:
if I understand you correctly you need to find k highest numbers that can be produced by multiplying together 1, 2 or 3 elements from your list, and all the values are floating point numbers between 0 and 1.
It is clear that you only need to consider the k highest numbers from the list. The rest can be discarded straight away. You can use your O(n log k) algorithm to get them, again in sorted order (I assume your list isn't preordered). To simplify the problem, you can now take their logarithms and try to maximize the sums of the numbers instead of the original problem of maximizing the products. This might speed up little.
Now (considering the logarithmic presentation), all your numbers are negative, so adding more of them together will just create more and more negative numbers.
Let's call the k highest numbers A1...Ak. We can reduce the problem further now assuming that there exists also number A0, that has the value 0 in the log representation and 1 in the original representation; then the problem is to enumerate the first k 3-tuples (x,y,z in {A0,...,Ak}) with the constraint that x ≥ y ≥ z and that z < A0. Let's denote 3-tuple by [i,j,n] and the sum of the elements in this tuple by S[i,j,n]. The first element to be reported is obviously [0,0,1], i.e. , which corresponds in the original problem formulation to the singleton #1 value on the list.
We use a max-heap as in the original formulation; we push the triples to the heap, using their sums (S[...]) as the ordering key. The algorithm starts by pushing [0,0,0] to the heap. Then:
answer = []
for m in 0 .. k:
top = heap.pop()
answer.append(sum(top))
(i,j,n) = top # explode the tuple
if (n < k - 1):
heap.push((i,j,n+1))
if (j == n):
heap.push((i,j+1,j+1))
if (i == j):
heap.push((i+1,i+1,i+1))
At the end, answer contains k + 1 elements, the first one of them is [0,0,0] which must be discarded.
Let be given as -1, -3, -8, -9. Then the algorithm proceeds like this:
Heap
Top Rest (shown in order)
[ 0, 0, 0] |
[ 0, 0,-1] | [ 0,-1,-1] [-1,-1,-1]
[ 0,-1,-1] | [-1,-1,-1] [ 0,-1,-3] [ 0,-3,-3]
[-1,-1,-1] | [-1,-1,-2] [ 0,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-2] | [ 0,-1,-3] [-1,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-3] | [-1,-1,-3] [ 0,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-3] | [ 0,-1,-4] [-1,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-4] | [-1,-2,-2] [-1,-1,-4] [ 0,-1,-5] [-2,-2,-2] [ 0,-3,-3]
...
etc.
The nice thing about this algorithm is that it doesn't enumerate duplicates and the heap size is O(k); to see why, observe that the algorithm adds on every iteration the maximum of elements on the heap (often less), so after k iterations there cannot be more than 2k elements in the heap.
This gives then running time O(n log k + k log k) = O((n + k) log k).
回答2:
I certainly see an optimization you could make.
Let M be the highest number from A.
Let M2 be M * M.
Let setMM2 consist of all x from A such that M2 < x < M
If size(setMM2) >= k,
then your top-k consist of the highest k elements.
Else
all x in setMM2 are in your top-k and your search becomes smaller
You can repeat this method with max(secondHighestNumber^2,M^3) and generalize the algorithm.
回答3:
kNSince numbers are from 0 to 1, more numbers you use, the worst it gets and problem is whit big k, for instance k=N^2
First try whit single numbers and push then in heap. O(N*Log(k))
Than use this numbers from heap and make another heap B whit 2 numbers => O(k*log(k)) at worst, but you can do some speedups if you sort numbers in case k>N
And then You have heap whit 2 numbers and there products and try making 3rd heap C from heap B same way as you would do for B, but from much bigger heap.
I think that this will make a O(k*log(k))
来源:https://stackoverflow.com/questions/5577206/generate-top-k-values