generate top k values

≯℡__Kan透↙ 提交于 2019-12-24 09:01:22

问题


I have a problem and I want to make sure if I am doing it most efficiently. I have an array A of float values of size N. The values are all between 0 and 1.

I have to find top k values which can be a product of a maximum of three numbers from A. So, the top-k list can have individual numbers from A, product of two numbers or product of three numbers from A.

So, this is how I am doing it now. I can get top-k numbers in desecding order in O(Nlogk) time. I then create a max-heap and initialize it with best values of maximum size 3 i.e. if I represent the sorted array(descending) of k values as B and the numbers by its index in that array, I insert numbers which are at index (0), (0,1) and (0,1,2). Next, I perform extract on heap and whenever I extract a size z (product of z numbers) value, I replace it with the set of next possible size z numbers i.e. if suppose (2,4) is extracted, I can replace it with (3,4) and (2,5). And do extract k times to get results.

Need better ideas if you have. Thanks all.


回答1:


if I understand you correctly you need to find k highest numbers that can be produced by multiplying together 1, 2 or 3 elements from your list, and all the values are floating point numbers between 0 and 1.

It is clear that you only need to consider the k highest numbers from the list. The rest can be discarded straight away. You can use your O(n log k) algorithm to get them, again in sorted order (I assume your list isn't preordered). To simplify the problem, you can now take their logarithms and try to maximize the sums of the numbers instead of the original problem of maximizing the products. This might speed up little.

Now (considering the logarithmic presentation), all your numbers are negative, so adding more of them together will just create more and more negative numbers.

Let's call the k highest numbers A1...Ak. We can reduce the problem further now assuming that there exists also number A0, that has the value 0 in the log representation and 1 in the original representation; then the problem is to enumerate the first k 3-tuples (x,y,z in {A0,...,Ak}) with the constraint that x &geq; y &geq; z and that z < A0. Let's denote 3-tuple by [i,j,n] and the sum of the elements in this tuple by S[i,j,n]. The first element to be reported is obviously [0,0,1], i.e. , which corresponds in the original problem formulation to the singleton #1 value on the list.

We use a max-heap as in the original formulation; we push the triples to the heap, using their sums (S[...]) as the ordering key. The algorithm starts by pushing [0,0,0] to the heap. Then:

answer = []
for m in 0 .. k:
  top = heap.pop()
  answer.append(sum(top))
  (i,j,n) = top # explode the tuple
  if (n < k - 1):
      heap.push((i,j,n+1))
  if (j == n):
      heap.push((i,j+1,j+1))
      if (i == j):
          heap.push((i+1,i+1,i+1))

At the end, answer contains k + 1 elements, the first one of them is [0,0,0] which must be discarded.

Let be given as -1, -3, -8, -9. Then the algorithm proceeds like this:

Heap
Top          Rest (shown in order)

[ 0, 0, 0] | 
[ 0, 0,-1] | [ 0,-1,-1] [-1,-1,-1]
[ 0,-1,-1] | [-1,-1,-1] [ 0,-1,-3] [ 0,-3,-3]
[-1,-1,-1] | [-1,-1,-2] [ 0,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-2] | [ 0,-1,-3] [-1,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-3] | [-1,-1,-3] [ 0,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-3] | [ 0,-1,-4] [-1,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-4] | [-1,-2,-2] [-1,-1,-4] [ 0,-1,-5] [-2,-2,-2] [ 0,-3,-3]
...
etc.

The nice thing about this algorithm is that it doesn't enumerate duplicates and the heap size is O(k); to see why, observe that the algorithm adds on every iteration the maximum of elements on the heap (often less), so after k iterations there cannot be more than 2k elements in the heap.

This gives then running time O(n log k + k log k) = O((n + k) log k).




回答2:


I certainly see an optimization you could make.

Let M be the highest number from A.
Let M2 be M * M.
Let setMM2 consist of all x from A such that M2 < x < M
If size(setMM2) >= k, 
    then your top-k consist of the highest k elements.
Else
    all x in setMM2 are in your top-k and your search becomes smaller

You can repeat this method with max(secondHighestNumber^2,M^3) and generalize the algorithm.




回答3:


kNSince numbers are from 0 to 1, more numbers you use, the worst it gets and problem is whit big k, for instance k=N^2

First try whit single numbers and push then in heap. O(N*Log(k))

Than use this numbers from heap and make another heap B whit 2 numbers => O(k*log(k)) at worst, but you can do some speedups if you sort numbers in case k>N

And then You have heap whit 2 numbers and there products and try making 3rd heap C from heap B same way as you would do for B, but from much bigger heap.

I think that this will make a O(k*log(k))



来源:https://stackoverflow.com/questions/5577206/generate-top-k-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!