Pair with Maximum AND value

问题

Given a very large array of integers in which element can go upto 10^9 how do I find a pair with maximum AND value. My current approach is I calculate all possible pairs and traverse through it and find maximum, however it is very slow. Any other approach?

回答1:

As long as you can find at least two numbers with the same most significant bit set, the solution will involve two of them.

Next, discard all other elements and remove everything left of this MSB for the numbers not discarded and solve the same problem. Until you have just two numbers remaining or there is nothing left to do.

For example:

 input  || first iteration | second iteration
=================================================================
1110010 ||       x         |        x
0110111 ||   discarded     |    discarded        
1000000 ||       x         |    discarded
1000010 ||       x         |        x
=================================================================
=> solution: first and last

This is O(n log max_element).

回答2:

Look at the first position where the bit patterns of two consecutive numbers n - 1 and n differ. A moment's reflection shows that this bit must be 1 for n and 0 for n - 1. It cannot be otherwise. To the left of that bit everything is equal, and to the right of that bit the bigger of the two has only 0s and the smaller has only 1s. It cannot be otherwise.

Hence we have:

n           -> (common prefix) 1 0*
n - 1       -> (common prefix) 0 1*
-----------------------------------
n & (n - 1) -> (common prefix) 0 0*

Example:

88          -> 101 1 000
87          -> 101 0 111
------------------------
88 & 87     -> 101 0 000

The shared prefix can be empty, and the starred tail with repeated bits (the 0* and 1* things) can be empty.

The tail always turns all zeroes, so if it is present then (n - 1) & (n - 2) must be greater than n & (n - 1), since (n - 1) has all 1s in that place and (n - 2) lacks only the last bit. It must also be greater than all other pair ANDs in the range up to n.

If the tail is not present - that is, if n is odd - then n & (n - 1) has the maximum AND value for the whole range preceding n. In case it's not obvious: another expression for 'the tail is present' would be 'n is even'.

One case that needs special handling is B = A + 1, i.e. if the range has the shortest possible length. In that case there is no (B - 2) to fall back on if B is even.

Hence the computation of the maximum AND for the range [A, B] with A < B becomes;

if ((B & 1) == 1 || A > B - 2)
    return B & (B - 1);
else
    return (B - 1) & (B - 2);

For the full Monty, have a look at my submission at HackerEarth.

Only after posting the answer did I discover that that problem under discussion in this topic is slightly different from Maximum AND at HackerEarth, since processing is to be done one a given vector of inputs instead of on contiguous ranges of numbers.

The above scheme could still supply an early-out condition should suitable successive values be discovered during the downward scan of the sorted sequence, but the likelihood of that being helpful is negligible.

Here is a function that can identify the highest shared bit in a sorted sequence, by scanning backwards from the end. It usually needs to scan only very few elements before it is done. Since it is impossible to select more than B + 1 values of at most B bits without them sharing any bits, the function must find the shared high bit of a sorted sequence by examining only logarithmically few values at its upper end.

static int common_high_bit (int[] v, int lo, int hi)
{
    int shared = 0;

    for (int seen = 0, i = hi - 1; i >= lo && v[i] >= shared; --i)
    {
        int m = shared | seen & v[i];

        if (m > shared)  // discovered a higher shared bit
        {
            m |= m >>  1;
            m |= m >>  2;
            m |= m >>  4;
            m |= m >>  8;
            m |= m >> 16;

            shared = m;
        }

        seen |= v[i];
    }

    return shared - (shared >> 1);
}

It isn't necessary to sort the whole sequence, though. It is sufficient to perform the 'heapify' step of heapsort (which can be done in O(n)) and then extract the biggest values one by one and feed them into the above algorithm until it bails out or the input is exhausted.

Once the high bit has been identified, all values with this bit set need to be extracted from the sequence, bit k and everything to the left of it zeroed, and the result subjected to the same process. Once the candidate set has shrunk to manageable size, the simple quadratic algorithm can handle the rest more efficiently than the big machinery.

Note: the shared high bit is not necessarily the MSB as implied by IVlad's answer. E.g. consider:

101xxxxx
 11xxxxx

As can be seen, the highest shared bit is not the MSB of either number.

For similar reasons we cannot assume a whole lot about the order of values in the remaining candidate set after masking out the shared high bit and whatever is left of it.

However, the situation is not bleak since aberrant, inconvenient constellations cannot be numerous, and once we're so far down in the sequence that the shared high bit is the MSB of the values there, only logarithmically few more values need to be extracted from it to complete the candidate set for the next step.

来源：https://stackoverflow.com/questions/28323686/pair-with-maximum-and-value

标签

algorithm

bitwise-operators

bitwise-and