Get number of elements greater than a number

I am trying to solve the following problem: Numbers are being inserted into a container. Each time a number is inserted I need to know how many elements are in the container that are greater than or equal to the current number being inserted. I believe both operations can be done in logarithmic complexity.

My question: Are there standard containers in a C++ library that can solve the problem? I know that std::multiset can insert elements in logarithmic time, but how can you query it? Or should I implement a data structure (e.x. a binary search tree) to solve it?

ondrejdee

Great question. I do not think there is anything in STL which would suit your needs (provided you MUST have logarithmic times). I think the best solution then, as aschepler says in comments, is to implement a RB tree. You may have a look at STL source code, particularly on stl_tree.h to see whether you could use bits of it.

Better still, look at : (Rank Tree in C++)

Which contains link to implementation:

(http://code.google.com/p/options/downloads/list)

You should use a multiset for logarithmic complexity, yes. But computing the distance is the problem, as set/map iterators are Bidirectional, not RandomAccess, std::distance has an O(n) complexity on them:

multiset<int> my_set;
...
auto it = my_map.lower_bound(3);
size_t count_inserted = distance(it, my_set.end()) // this is definitely O(n)
my_map.insert(make_pair(3);

Your complexity-issue is complicated. Here is a full analysis:

If you want a O(log(n)) complexity for each insertion, you need a sorted structure as a set. If you want the structure to not reallocate or move items when adding a new item, the insertion point distance computation will be O(n). If know the insertion size in advance, you do not need logarithmic insertion time in a sorted container. You can insert all the items then sort, it is as much O(n.log(n)) as n * O(log(n)) insertions in a set. The only alternative is to use a dedicated container like a weighted RB-tree. Depending on your problem this may be the solution, or something really overkill.

Use multiset and distance, you are O(n.log(n)) on insertion (yes, n insertions * log(n) insertion time for each one of them), O(n.n) on distance computation, but computing distances is very fast.
If you know the inserted data size (n) in advance : Use a vector, fill it, sort it, return your distances, you are O(n.log(n)), and it is easy to code.
If you do not know n in advance, your n is likely huge, each item is memory-heavy so you can not have O(n.log(n)) reallocation : then you have time to re-encode or re-use some non-standard code, you really have to meet these complexity expectations, use a dedicated container. Also consider using a database, you will probably have issues maintaining this in memory.

Sounds like a case for count_if - although I admit this doesn't solve it at logarithmic complexity, that would require a sorted type.

vector<int> v = { 1, 2, 3, 4, 5 };
int some_value = 3;

int count = count_if(v.begin(), v.end(), [some_value](int n) { return n > some_value; } );

Edit done to fix syntactic problems with lambda function

Jørgen Fogh

If the whole range of numbers is sufficiently small (on the order of a few million), this problem can be solved relatively easily using a Fenwick tree.

Although Fenwick trees are not part of the STL, they are both very easy to implement and time efficient. The time complexity is O(log N) for both updates and queries and the constant factors are low.

You mention in a comment on another question, that you needed this for a contest. Fenwick trees are very popular tools in competitive programming and are often useful.

来源：https://stackoverflow.com/questions/17428841/get-number-of-elements-greater-than-a-number

标签

c++

algorithm

containers