Linear time majority algorithm?

问题

Can anyone think of a linear time algorithm for determining a majority element in a list of elements? The algorithm should use O(1) space.

If n is the size of the list, a majority element is an element that occurs at least ceil(n / 2) times.

[Input] 1, 2, 1, 1, 3, 2

[Output] 1

[Editor Note] This question has a technical mistake. I preferred to leave it so as not to spoil the wording of the accepted answer, which corrects the mistake and discusses why. Please check the accepted answer.

回答1:

I would guess that the Boyer-Moore algorithm (as linked to by nunes and described by cldy in other answers) is the intended answer to the question; but the definition of "majority element" in the question is too weak to guarantee that the algorithm will work.

If n is the size of the list. A majority element is an element that occurs at least ceil(n/2) times.

The Boyer-Moore algorithm finds an element with a strict majority, if such an element exists. (If you don't know in advance that you do have such an element, you have to make a second pass through the list to check the result.)

For a strict majority, you need "... strictly more than floor(n/2) times", not "... at least ceil(n/2) times".

In your example, "1" occurs 3 times, and other values occur 3 times:

Example input: 1, 2, 1, 1, 3, 2

Output: 1

but you need 4 elements with the same value for a strict majority.

It does happen to work in this particular case:

Input: 1, 2, 1, 1, 3, 2
Read 1: count == 0, so set candidate to 1, and set count to 1
Read 2: count != 0, element != candidate (1), so decrement count to 0
Read 1: count == 0, so set candidate to 1, and set count to 1
Read 1: count != 0, element == candidate (1), so increment count to 2
Read 3: count != 0, element != candidate (1), so decrement count to 1
Read 2: count != 0, element != candidate (1), so decrement count to 0
Result is current candidate: 1

but look what happens if the final "1" and the "2" at the end are swapped over:

Input: 1, 2, 1, 2, 3, 1
Read 1: count == 0, so set candidate to 1, and set count to 1
Read 2: count != 0, element != candidate (1), so decrement count to 0
Read 1: count == 0, so set candidate to 1, and set count to 1
Read 2: count != 0, element != candidate (1), so decrement count to 0
Read 3: count == 0, so set candidate to 3, and set count to 1
Read 1: count != 0, element != candidate (3), so decrement count to 0
Result is current candidate: 3

回答2:

Boyer-moore algorithm: http://www.cs.utexas.edu/~moore/best-ideas/mjrty/index.html

You scan a list (or stream) and maintain one counter. Initially counter = 0, majority_element = null. As you scan, if the counter is 0, you assume current element as majority element and increment counter. If counter != 0, you increment or decrement the counter according to whether current element is the current majority element.

This algorithm doesn't give you the majority if there isn't one. If you want level of correctness, you would have to make one more pass to validate it is in fact the majority (i.e., >= 50%).

回答3:

This is a popular challenge question, and the answer is that it's impossible. The language of strings with majority elements is not regular (this is easily proven by the pumping lemma) so there is no way it can be recognized in constant space.

Of course the trick is that you need a counter variable which takes O(log n) space, but since n is boundd by 2^32 or 2^64 and your computer is really a finite state machine with ~8^(ramsize+hddsize) states, everything is O(1).

回答4:

I think it is possible, using Boyer-Moore, though not directly.

As Matthew stated, Boyer-Moore only guarantees to find the majority element for a slightly different definition of majority, called strict majority. Your definition is slightly weaker, but not by much.

Execute Boyer-Moore: O(N) time, O(1) space
Check that the candidate fulfills the condition: O(N) time, O(1) space
If it doesn't, execute Boyer-Moore, but ignores the instances of the "failed" candidate: O(N) time, O(1) space
Check that the (new) candidate fulfills the condition: O(N) time, O(1) space

The 1. and 2. steps are straight-forward. The 3. works because by removing instances of the failed candidates, we are now looking for a strict majority element. The 4. is optional, and only to be used if there is a possibility that no majority element exists.

回答5:

If you know that the majority element is in more of half of the array size then there is such algorithm. You keep track of the most common element and the repetitions of it. When you start that element is the first and there is one repetition. If the next element is different from the current most common then you substract one from the repetitions. If the repetitions become zero then you change the most common with the element you are currently observing and set the repetitions to 1.

回答6:

I could be wrong, but the combination of a O(n) execution time and constant memory usage seems impossible to me. Not using extra space would require sorting. The fastest comparison sort is O(n log n).

Using Radix sort, you can get a better worst case execution time, but more memory usage.

回答7:

Use pre-stages of heap sort:

Build a heap for the array elements which runs in linear time -> O(n).
Then take (N/2)th element & search to upper parent nodes of it if all are equal or not -> O(n/2)

if all are equal then (N/2)th element is ans.

so overall O(n) + O(n/2) -> O(n)

来源：https://stackoverflow.com/questions/4280450/linear-time-majority-algorithm

标签

algorithm

language-agnostic

complexity-theory