How can retrieve the two highest item from a list containing 100,000 integers without having to sort the entire list first?
JacobM's answer is absolutely the way to go. However, there are a few things to keep in mind while implementing what he described. Here's a little play-along-at-home tutorial to guide you through the trickier parts of solving this problem.
If this code is meant for production use, please use one of the more efficient/concise answers listed. This answer is targetted at someone new to programming.
The idea is simple.
largest and second_largest.largest, assign it to largest.second_largest, but less than largest, assign it to second_largest.Let's start.
def two_largest(inlist):
"""Return the two largest items in the sequence. The sequence must
contain at least two items."""
for item in inlist:
if item > largest:
largest = item
elif largest > item > second_largest:
second_largest = item
# Return the results as a tuple
return largest, second_largest
# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
inlist = [3, 2, 1]
print two_largest(inlist)
Okay, we now have JacobM's answer as a Python function. What happens when we try to run it?
Traceback (most recent call last):
File "twol.py", line 10, in
print two_largest(inlist)
File "twol.py", line 3, in two_largest
if item > largest:
UnboundLocalError: local variable 'largest' referenced before assignment
Apparently, we need to set largest before we start the loop. This probably means we should set second_largest too.
Let's set largest and second_largest to 0.
def two_largest(inlist):
"""Return the two largest items in the sequence. The sequence must
contain at least two items."""
largest = 0 # NEW!
second_largest = 0 # NEW!
for item in inlist:
if item > largest:
largest = item
elif largest > item > second_largest:
second_largest = item
# Return the results as a tuple
return largest, second_largest
# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
inlist = [3, 2, 1]
print two_largest(inlist)
Good. Let's run it.
(3, 2)
Great! Now let's test with inlist being [1, 2, 3]
inlist = [1, 2, 3] # CHANGED!
Let's try it.
(3, 0)
...Uh oh.
The largest value (3) seems correct. The second-largest value is completely wrong though. What's going on?
Let's work through what the function is doing.
largest is 0 and second_largest is also 0.largest becomes 1.largest becomes 2.But what about second_largest?
When we assign a new value to largest, the largest value actually becomes second-largest. We need to show that in the code.
def two_largest(inlist):
"""Return the two largest items in the sequence. The sequence must
contain at least two items."""
largest = 0
second_largest = 0
for item in inlist:
if item > largest:
second_largest = largest # NEW!
largest = item
elif largest > item > second_largest:
second_largest = item
# Return the results as a tuple
return largest, second_largest
# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
inlist = [1, 2, 3]
print two_largest(inlist)
Let's run it.
(3, 2)
Fantastic.
Now let's try it with a list of negative numbers.
inlist = [-1, -2, -3] # CHANGED!
Let's run it.
(0, 0)
That's not right at all. Where did these zeroes come from?
It turns out that the starting values for largest and second_largest were actually larger than all the items in the list. The first thing you might consider is setting largest and second_largest to the lowest values possible in Python. Unfortunately, Python doesn't have a smallest possible value. That means that, even if you set both of them to -1,000,000,000,000,000,000, you can have a list of values smaller than that.
So what's the best thing to do? Let's try setting largest and second_largest to the first and second items in the list. Then, to avoid double-counting any items in the list, we only look at the part of the list after the second item.
def two_largest(inlist):
"""Return the two largest items in the sequence. The sequence must
contain at least two items."""
largest = inlist[0] # CHANGED!
second_largest = inlist[1] # CHANGED!
# Only look at the part of inlist starting with item 2
for item in inlist[2:]: # CHANGED!
if item > largest:
second_largest = largest
largest = item
elif largest > item > second_largest:
second_largest = item
# Return the results as a tuple
return largest, second_largest
# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
inlist = [-1, -2, -3]
print two_largest(inlist)
Let's run it.
(-1, -2)
Great! Let's try with another list of negative numbers.
inlist = [-3, -2, -1] # CHANGED!
Let's run it.
(-1, -3)
Wait, what?
Let's step through our logic again.
largest is set to -3second_largest is set to -2Wait right there. Already, this seems wrong. -2 is larger than -3. Is this what caused the problem? Let's continue.
largest is set to -1; second_largest is set to the old value of largest, which is -3Yes, that looks to be the problem. We need to ensure that largest and second_largest are set correctly.
def two_largest(inlist):
"""Return the two largest items in the sequence. The sequence must
contain at least two items."""
if inlist[0] > inlist[1]: # NEW
largest = inlist[0]
second_largest = inlist[1]
else: # NEW
largest = inlist[1] # NEW
second_largest = inlist[0] # NEW
# Only look at the part of inlist starting with item 2
for item in inlist[2:]:
if item > largest:
second_largest = largest
largest = item
elif largest > item > second_largest:
second_largest = item
# Return the results as a tuple
return largest, second_largest
# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
inlist = [-3, -2, -1]
print two_largest(inlist)
Let's run it.
(-1, -2)
Excellent.
So here's the code, nicely commented and formatted. It's also had all the bugs I could find beaten from it. Enjoy.
However, assuming this really is a homework question, I hope you get some useful experience from seeing an imperfect piece of code slowly improved. I hope some of these techniques will be useful in future programming assignments.
Not very efficient. But for most purposes, it should be okay: on my computer (Core 2 Duo), a list of 100 000 items can be processed in 0.27 seconds (using timeit, averaged over 100 runs).