Retrieve the two highest item from a list containing 100,000 integers

后端 未结 15 793
心在旅途
心在旅途 2020-12-13 17:55

How can retrieve the two highest item from a list containing 100,000 integers without having to sort the entire list first?

15条回答
  •  隐瞒了意图╮
    2020-12-13 18:33

    JacobM's answer is absolutely the way to go. However, there are a few things to keep in mind while implementing what he described. Here's a little play-along-at-home tutorial to guide you through the trickier parts of solving this problem.

    If this code is meant for production use, please use one of the more efficient/concise answers listed. This answer is targetted at someone new to programming.

    The idea

    The idea is simple.

    • Keep two variables: largest and second_largest.
    • Go through the list.
      • If an item is greater than largest, assign it to largest.
      • If an item is greater than second_largest, but less than largest, assign it to second_largest.

    Getting started

    Let's start.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        for item in inlist:
            if item > largest:
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [3, 2, 1]
        print two_largest(inlist)
    

    Okay, we now have JacobM's answer as a Python function. What happens when we try to run it?

    Traceback (most recent call last):
      File "twol.py", line 10, in 
        print two_largest(inlist)
      File "twol.py", line 3, in two_largest
        if item > largest:
    UnboundLocalError: local variable 'largest' referenced before assignment
    

    Apparently, we need to set largest before we start the loop. This probably means we should set second_largest too.

    Initializing variables

    Let's set largest and second_largest to 0.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        largest = 0 # NEW!
        second_largest = 0 # NEW!
        for item in inlist:
            if item > largest:
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [3, 2, 1]
        print two_largest(inlist)
    

    Good. Let's run it.

    (3, 2)
    

    Great! Now let's test with inlist being [1, 2, 3]

        inlist = [1, 2, 3] # CHANGED!
    

    Let's try it.

    (3, 0)
    

    ...Uh oh.

    Fixing the logic

    The largest value (3) seems correct. The second-largest value is completely wrong though. What's going on?

    Let's work through what the function is doing.

    • When we start, largest is 0 and second_largest is also 0.
    • The first item in the list we look at is 1, so largest becomes 1.
    • The next item is 2, so largest becomes 2.

    But what about second_largest?

    When we assign a new value to largest, the largest value actually becomes second-largest. We need to show that in the code.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        largest = 0
        second_largest = 0
        for item in inlist:
            if item > largest:
                second_largest = largest # NEW!
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [1, 2, 3]
        print two_largest(inlist)
    

    Let's run it.

    (3, 2)
    

    Fantastic.

    Initializing variables, part 2

    Now let's try it with a list of negative numbers.

        inlist = [-1, -2, -3] # CHANGED!
    

    Let's run it.

    (0, 0)
    

    That's not right at all. Where did these zeroes come from?

    It turns out that the starting values for largest and second_largest were actually larger than all the items in the list. The first thing you might consider is setting largest and second_largest to the lowest values possible in Python. Unfortunately, Python doesn't have a smallest possible value. That means that, even if you set both of them to -1,000,000,000,000,000,000, you can have a list of values smaller than that.

    So what's the best thing to do? Let's try setting largest and second_largest to the first and second items in the list. Then, to avoid double-counting any items in the list, we only look at the part of the list after the second item.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        largest = inlist[0] # CHANGED!
        second_largest = inlist[1] # CHANGED!
        # Only look at the part of inlist starting with item 2
        for item in inlist[2:]: # CHANGED!
            if item > largest:
                second_largest = largest
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [-1, -2, -3]
        print two_largest(inlist)
    

    Let's run it.

    (-1, -2)
    

    Great! Let's try with another list of negative numbers.

        inlist = [-3, -2, -1] # CHANGED!
    

    Let's run it.

    (-1, -3)
    

    Wait, what?

    Initializing variables, part 3

    Let's step through our logic again.

    • largest is set to -3
    • second_largest is set to -2

    Wait right there. Already, this seems wrong. -2 is larger than -3. Is this what caused the problem? Let's continue.

    • largest is set to -1; second_largest is set to the old value of largest, which is -3

    Yes, that looks to be the problem. We need to ensure that largest and second_largest are set correctly.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        if inlist[0] > inlist[1]: # NEW
            largest = inlist[0]
            second_largest = inlist[1]
        else: # NEW
            largest = inlist[1] # NEW
            second_largest = inlist[0] # NEW
        # Only look at the part of inlist starting with item 2
        for item in inlist[2:]:
            if item > largest:
                second_largest = largest
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [-3, -2, -1]
        print two_largest(inlist)
    

    Let's run it.

    (-1, -2)
    

    Excellent.

    Conclusion

    So here's the code, nicely commented and formatted. It's also had all the bugs I could find beaten from it. Enjoy.

    However, assuming this really is a homework question, I hope you get some useful experience from seeing an imperfect piece of code slowly improved. I hope some of these techniques will be useful in future programming assignments.


    Efficiency

    Not very efficient. But for most purposes, it should be okay: on my computer (Core 2 Duo), a list of 100 000 items can be processed in 0.27 seconds (using timeit, averaged over 100 runs).

提交回复
热议问题