Python list.pop(i) time complexity?

☆樱花仙子☆ 提交于 2021-02-08 14:43:40

问题


I look up online and know that list.pop() has O(1) time complexity but list.pop(i) has O(n) time complexity. While I am writing leetcode, many people use pop(i) in a for loop and they say it is O(n) time complexity and in fact it is faster than my code, which only uses one loop but many lines in that loop. I wonder why this would happen, and should I use pop(i) instead of many lines to avoid it?

Example: Leetcode 26. Remove Duplicates from Sorted Array

My code: (faster than 75%)

class Solution(object):
    def removeDuplicates(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        left, right = 0, 0
        count = 1
        while right < len(nums)-1:
            if nums[right] == nums[right+1]:
                right += 1
            else:
                nums[left+1]=nums[right+1]
                left += 1
                right += 1
                count += 1
        return count

and other people's code, faster than 90%: (this guy does not say O(n), but why O(n^2) faster than my O(n)?)

https://leetcode.com/problems/remove-duplicates-from-sorted-array/discuss/477370/python-3%3A-straight-forward-6-lines-solution-90-faster-100-less-memory

My optimized code (faster than 89%)

class Solution(object):
    def removeDuplicates(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        left, right = 0, 0
        while right < len(nums)-1:
            if nums[right] != nums[right+1]:
                nums[left+1]=nums[right+1]
                left += 1
            right += 1
        return left + 1

回答1:


Your algorithm genuinely does take O(n) time and the "pop in reverse order" algorithm genuinely does take O(n²) time. However, LeetCode isn't reporting that your time complexity is better than 89% of submissions; it is reporting your actual running time is better than 89% of all submissions. The actual running time depends on what inputs the algorithm is tested with; not just the sizes but also the number of duplicates.

It also depends how the running times across multiple test cases are averaged; if most of the test cases are for small inputs where the quadratic solution is faster, then the quadratic solution may come out ahead overall even though its time complexity is higher. @Heap Overflow also points out in the comments that the overhead time of LeetCode's judging system is proportionally large and quite variable compared to the time it takes for the algorithms to run, so the discrepancy could simply be due to random variation in that overhead.

To shed some light on this, I measured running times using timeit. The graph below shows my results; the shapes are exactly what you'd expect given the time complexities, and the crossover point is somewhere between 8000 < n < 9000 on my machine. This is based on sorted lists where each distinct element appears on average twice. The code I used to generate the times is given below.

Timing code:

def linear_solution(nums):
    left, right = 0, 0
    while right < len(nums)-1:
        if nums[right] != nums[right+1]:
            nums[left+1]=nums[right+1]
            left += 1
        right += 1
    return left + 1

def quadratic_solution(nums):
    prev_obj = []
    for i in range(len(nums)-1,-1,-1):
        if prev_obj == nums[i]:
            nums.pop(i)
        prev_obj = nums[i]
    return len(nums)

from random import randint
from timeit import timeit

def gen_list(n):
    max_n = n // 2
    return sorted(randint(0, max_n) for i in range(n))

# I used a step size of 1000 up to 15000, then a step size of 5000 up to 50000
step = 1000
max_n = 15000
reps = 100

print('n', 'linear time (ms)', 'quadratic time (ms)', sep='\t')
for n in range(step, max_n+1, step):
    # generate input lists
    lsts1 = [ gen_list(n) for i in range(reps) ]
    # copy the lists by value, since the algorithms will mutate them
    lsts2 = [ list(g) for g in lsts1 ]
    # use iterators to supply the input lists one-by-one to timeit
    iter1 = iter(lsts1)
    iter2 = iter(lsts2)
    t1 = timeit(lambda: linear_solution(next(iter1)), number=reps)
    t2 = timeit(lambda: quadratic_solution(next(iter2)), number=reps)
    # timeit reports the total time in seconds across all reps
    print(n, 1000*t1/reps, 1000*t2/reps, sep='\t')

The conclusion is that your algorithm is indeed faster than the quadratic solution for large enough inputs, but the inputs LeetCode is using to measure running times are not "large enough" to overcome the variation in the judging overhead, and the fact that the average includes times measured on smaller inputs where the quadratic algorithm is faster.




回答2:


Just because the solution is not O(n), you can't assume it to be O(n^2).

It doesn't quite become O(n^2) because he is using pop in reverse order which decreases the time to pop every time, using pop(i) on forward order will consume more time than that on reverse, as the pop searches from reverse and in every loop he is decreasing the number of elements on the back. Try that same solution in non-reverse order, run few times to make sure, you'll see.

Anyway, regarding why his solution is faster, You have an if condition with a lot of variables, he has only used one variable prev_obj, using the reverse order makes it possible to do with just one variable. So the number of basic mathematical operations are more in your case, so with same O(n) complexity each of your n-loops is longer than his.

Just look at your count varible, in every iteration its value is left+1 you could return left+1, just removing that would decrease n amount of count=count+1 you have to do.

I just posted this solution and it is 76% faster

class Solution:
    def removeDuplicates(self, nums: List[int]) -> int:
        a=sorted(set(nums),key=lambda item:item)
        for i,v in enumerate(a):
            nums[i]=v
        return len(a)

and this one gives faster than 90%.

class Solution:
    def removeDuplicates(self, nums: List[int]) -> int:
        a ={k:1 for k in nums} #<--- this is O(n)
        for i,v in enumerate(a.keys()): #<--- this is another O(n), but the length is small so O(m)
            nums[i]=v
        return len(a)

You can say both of them are more than O(n) if you look at the for loop, But since we are working with dublicate members when I am looping over the reduced memebers while your code is looping over all memebers. So the time required to make that unique set/dict is if lesser than time required for you to loop over those extra members and to check for if conditions, then my solution can be faster.



来源:https://stackoverflow.com/questions/59744621/python-list-popi-time-complexity

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!