Indexing count of buckets

删除回忆录丶 提交于 2019-12-05 11:53:48

I have understood your problem as:

Each bucket has an internal order and buckets themselves have an order, so all the elements have some ordering and you need the ith element in that ordering.

To solve that:

What you can do is maintain a 'cumulative value' tree where the leaf nodes (x1, x2, ..., xn) are the bucket sizes. The value of a node is the sum of values of its immediate children. Keeping n a power of 2 will make it simple (you can always pad it with zero size buckets in the end) and the tree will be a complete tree.

Corresponding to each bucket you will maintain a pointer to the corresponding leaf node.

Eg, say the bucket sizes are 2,1,4,8.

The tree will look like

     15
    /  \
   3    12
  / \  / \
 2  1  4  8

If you want the total count, read the value of the root node.

If you want to modify some xk (i.e. change correspond bucket size), you can walk up the tree following parent pointers, updating the values.

For instance if you add 4 items to the second bucket it will be (the nodes marked with * are the ones that changed)

     19*
    /   \
   7*    12
  / \   / \
 2  5*  4  8

If you want to find the ith element, you walk down the above tree, effectively doing the binary search. You already have a left child and right child count. If i > left child node value of current node, you subtract the left child node value and recurse in the right tree. If i <= left child node value, you go left and recurse again.

Say you wanted to find the 9th element in the above tree:

Since left child of root is 7 < 9. You subtract 7 from 9 (to get 2) and go right.

Since 2 < 4 (the left child of 12), you go left.

You are at the leaf node corresponding to the third bucket. You now need to pick the second element in that bucket.

If you have to add a new bucket, you double the size of your tree (if needed) by adding a new root, making the existing tree the left child and add a new tree with all zero buckets except the one you added (which we be the leftmost leaf of the new tree). This will be amortized O(1) time for adding a new value to the tree. Caveat is you can only add a bucket at the end, and not anywhere in the middle.

Getting the total count is O(1). Updating single bucket/lookup of item are O(logn).

Adding new bucket is amortized O(1).

Space usage is O(n).

Instead of a binary tree, you can probably do the same with a B-Tree.

I still hope for answers, however here is what I could come up so far, following @Moron suggestion.

Apparently my little Fenwick Tree idea cannot be easily adapted. It's easy to append new buckets at the end of the fenwick tree, but not in it the middle, so it's kind of a lost cause.

We're left with 2 data structures: Binary Indexed Trees (ironically the very name Fenwick used to describe his structure) and Ranked Skip List.

Typically, this does not separate the data from the index, however we can get this behavior by:

  1. Use indirection: the element held by the node is a pointer to a bucket, not the bucket itself
  2. Use pool allocation so that the index elements, even though allocated independently from one another, are still close in memory which shall helps the cache

I tend to prefer Skip Lists to Binary Trees because they are self-organizing, so I'm spared the trouble of constantly re-balancing my tree.

These structures would allow to get to the ith element in O(log N), I don't know if it's possible to get faster asymptotic performance.

Another interesting implementation detail is I have a pointer to this element, but others might have been inserted/removed, how do I know the rank of my element now?

It's possible if the bucket points back to the node that owns it. But this means that either the node should not move or it should update the bucket's pointer when moved around.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!