Fastest way to check if a value exists in a list

后端 未结 12 2264
猫巷女王i
猫巷女王i 2020-11-22 00:18

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

I know that all values in the list are uniqu

12条回答
  •  深忆病人
    2020-11-22 00:48

    The original question was:

    What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

    Thus there are two things to find:

    1. is an item in the list, and
    2. what is the index (if in the list).

    Towards this, I modified @xslittlegrass code to compute indexes in all cases, and added an additional method.

    Results

    Methods are:

    1. in--basically if x in b: return b.index(x)
    2. try--try/catch on b.index(x) (skips having to check if x in b)
    3. set--basically if x in set(b): return b.index(x)
    4. bisect--sort b with its index, binary search for x in sorted(b). Note mod from @xslittlegrass who returns the index in the sorted b, rather than the original b)
    5. reverse--form a reverse lookup dictionary d for b; then d[x] provides the index of x.

    Results show that method 5 is the fastest.

    Interestingly the try and the set methods are equivalent in time.


    Test Code

    import random
    import bisect
    import matplotlib.pyplot as plt
    import math
    import timeit
    import itertools
    
    def wrapper(func, *args, **kwargs):
        " Use to produced 0 argument function for call it"
        # Reference https://www.pythoncentral.io/time-a-python-function/
        def wrapped():
            return func(*args, **kwargs)
        return wrapped
    
    def method_in(a,b,c):
        for i,x in enumerate(a):
            if x in b:
                c[i] = b.index(x)
            else:
                c[i] = -1
        return c
    
    def method_try(a,b,c):
        for i, x in enumerate(a):
            try:
                c[i] = b.index(x)
            except ValueError:
                c[i] = -1
    
    def method_set_in(a,b,c):
        s = set(b)
        for i,x in enumerate(a):
            if x in s:
                c[i] = b.index(x)
            else:
                c[i] = -1
        return c
    
    def method_bisect(a,b,c):
        " Finds indexes using bisection "
    
        # Create a sorted b with its index
        bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])
    
        for i,x in enumerate(a):
            index = bisect.bisect_left(bsorted,(x, ))
            c[i] = -1
            if index < len(a):
                if x == bsorted[index][0]:
                    c[i] = bsorted[index][1]  # index in the b array
    
        return c
    
    def method_reverse_lookup(a, b, c):
        reverse_lookup = {x:i for i, x in enumerate(b)}
        for i, x in enumerate(a):
            c[i] = reverse_lookup.get(x, -1)
        return c
    
    def profile():
        Nls = [x for x in range(1000,20000,1000)]
        number_iterations = 10
        methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
        time_methods = [[] for _ in range(len(methods))]
    
        for N in Nls:
            a = [x for x in range(0,N)]
            random.shuffle(a)
            b = [x for x in range(0,N)]
            random.shuffle(b)
            c = [0 for x in range(0,N)]
    
            for i, func in enumerate(methods):
                wrapped = wrapper(func, a, b, c)
                time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))
    
        markers = itertools.cycle(('o', '+', '.', '>', '2'))
        colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
        labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))
    
        for i in range(len(time_methods)):
            plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))
    
        plt.xlabel('list size', fontsize=18)
        plt.ylabel('log(time)', fontsize=18)
        plt.legend(loc = 'upper left')
        plt.show()
    
    profile()
    

提交回复
热议问题