Fastest way to find smallest missing integer from list of integers

问题

I have a list of 100 random integers. Each random integer has a value from 0 to 99. Duplicates are allowed, so the list could be something like

56, 1, 1, 1, 1, 0, 2, 6, 99...

I need to find the smallest integer (>= 0) is that is not contained in the list.

My initial solution is this:

vector<int> integerList(100); //list of random integers
...
vector<bool> listedIntegers(101, false);
for (int theInt : integerList)
{
    listedIntegers[theInt] = true;
}
int smallestInt;
for (int j = 0; j < 101; j++)
{
    if (!listedIntegers[j])
    {
        smallestInt = j;
        break;
    }
}

But that requires a secondary array for book-keeping and a second (potentially full) list iteration. I need to perform this task millions of times (the actual application is in a greedy graph coloring algorithm, where I need to find the smallest unused color value with a vertex adjacency list), so I'm wondering if there's a clever way to get the same result without so much overhead?

回答1:

I believe there is no faster way to do it. What you can do in your case is to reuse vector<bool>, you need to have just one such vector per thread.

Though the better approach might be to reconsider the whole algorithm to eliminate this step at all. Maybe you can update least unused color on every step of the algorithm?

回答2:

It's been a year, but ...

One idea that comes to mind is to keep track of the interval(s) of unused values as you iterate the list. To allow efficient lookup, you could keep intervals as tuples in a binary search tree, for example.

So, using your sample data:

56, 1, 1, 1, 1, 0, 2, 6, 99...

You would initially have the unused interval [0..99], and then, as each input value is processed:

56: [0..55][57..99]
1: [0..0][2..55][57..99]
1: no change
1: no change
1: no change
0: [2..55][57..99]
2: [3..55][57..99]
6: [3..5][7..55][57..99]
99: [3..5][7..55][57..98]

Result (lowest value in lowest remaining interval): 3

回答3:

Since you have to scan the whole list no matter what, the algorithm you have is already pretty good. The only improvement I can suggest without measuring (that will surely speed things up) is to get rid of your vector<bool>, and replace it with a stack-allocated array of 4 32-bit integers or 2 64-bit integers.

Then you won't have to pay the cost of allocating an array on the heap every time, and you can get the first unused number (the position of the first 0 bit) much faster. To find the word that contains the first 0 bit, you only need to find the first one that isn't the maximum value, and there are bit twiddling hacks you can use to get the first 0 bit in that word very quickly.

回答4:

You program is already very efficient, in O(n). Only marginal gain can be found. One possibility is to divide the number of possible values in blocks of size block, and to register not in an array of bool but in an array of int, in this case memorizing the value modulo block.
In practice, we replace a loop of size N by a loop of size N/block plus a loop of size block.
Theoretically, we could select block = sqrt(N) = 12 in order to minimize the quantity N/block + block.
In the program hereafter, block of size 8 are selected, assuming that dividing integers by 8 and calculating values modulo 8 should be fast.
However, it is clear that a gain, if any, can be obtained only for a minimum value rather large!

constexpr int N = 100;
int find_min1 (const std::vector<int> &IntegerList) {
    constexpr int Size = 13;    //N / block
    constexpr int block = 8;
    constexpr int Vmax = 255;   // 2^block - 1

    int listedBlocks[Size] = {0};
    for (int theInt : IntegerList) {
        listedBlocks[theInt / block] |= 1 << (theInt % block);
    }
    for (int j = 0; j < Size; j++) {
        if (listedBlocks[j] == Vmax) continue;
        int &k = listedBlocks[j];
        for (int b = 0; b < block; b++) {
            if ((k%2) == 0) return block * j + b;
            k /= 2;
        }
    }
    return -1;
}

回答5:

Potentially you can reduce the last step to O(1) by using some bit manipulation, in your case __int128, set the corresponding bits in loop one and call something like __builtin_clz or use the appropriate bit hack

回答6:

The best solution I could find for finding smallest integer from a set is https://codereview.stackexchange.com/a/179042/31480

Here are c++ version.

int solution(std::vector<int>& A)
{
    for (std::vector<int>::size_type i = 0; i != A.size(); i++) 
    {
        while (0 < A[i] && A[i] - 1 < A.size()
            && A[i] != i + 1
            && A[i] != A[A[i] - 1]) 
        {
            int j = A[i] - 1;

            auto tmp = A[i];
            A[i] = A[j];
            A[j] = tmp;
        }
    }

    for (std::vector<int>::size_type i = 0; i != A.size(); i++)
    {
        if (A[i] != i+1)
        {
            return i + 1;
        }
    }
    return A.size() + 1;
}

来源：https://stackoverflow.com/questions/53466817/fastest-way-to-find-smallest-missing-integer-from-list-of-integers

标签

c++

arrays

algorithm

vector

time-complexity