Why is 'x' in ('x',) faster than 'x' == 'x'?

后端 未结 2 1740
误落风尘
误落风尘 2021-01-29 18:15
>>> timeit.timeit(\"\'x\' in (\'x\',)\")
0.04869917374131205
>>> timeit.timeit(\"\'x\' == \'x\'\")
0.06144205736110564

Also works for

2条回答
  •  我在风中等你
    2021-01-29 18:51

    There are three factors at play here which, combined, produce this surprising behavior.

    First: the in operator takes a shortcut and checks identity (x is y) before it checks equality (x == y):

    >>> n = float('nan')
    >>> n in (n, )
    True
    >>> n == n
    False
    >>> n is n
    True
    

    Second: because of Python's string interning, both "x"s in "x" in ("x", ) will be identical:

    >>> "x" is "x"
    True
    

    (big warning: this is implementation-specific behavior! is should never be used to compare strings because it will give surprising answers sometimes; for example "x" * 100 is "x" * 100 ==> False)

    Third: as detailed in Veedrac's fantastic answer, tuple.__contains__ (x in (y, ) is roughly equivalent to (y, ).__contains__(x)) gets to the point of performing the identity check faster than str.__eq__ (again, x == y is roughly equivalent to x.__eq__(y)) does.

    You can see evidence for this because x in (y, ) is significantly slower than the logically equivalent, x == y:

    In [18]: %timeit 'x' in ('x', )
    10000000 loops, best of 3: 65.2 ns per loop
    
    In [19]: %timeit 'x' == 'x'    
    10000000 loops, best of 3: 68 ns per loop
    
    In [20]: %timeit 'x' in ('y', ) 
    10000000 loops, best of 3: 73.4 ns per loop
    
    In [21]: %timeit 'x' == 'y'    
    10000000 loops, best of 3: 56.2 ns per loop
    

    The x in (y, ) case is slower because, after the is comparison fails, the in operator falls back to normal equality checking (i.e., using ==), so the comparison takes about the same amount of time as ==, rendering the entire operation slower because of the overhead of creating the tuple, walking its members, etc.

    Note also that a in (b, ) is only faster when a is b:

    In [48]: a = 1             
    
    In [49]: b = 2
    
    In [50]: %timeit a is a or a == a
    10000000 loops, best of 3: 95.1 ns per loop
    
    In [51]: %timeit a in (a, )      
    10000000 loops, best of 3: 140 ns per loop
    
    In [52]: %timeit a is b or a == b
    10000000 loops, best of 3: 177 ns per loop
    
    In [53]: %timeit a in (b, )      
    10000000 loops, best of 3: 169 ns per loop
    

    (why is a in (b, ) faster than a is b or a == b? My guess would be fewer virtual machine instructions — a in (b, ) is only ~3 instructions, where a is b or a == b will be quite a few more VM instructions)

    Veedrac's answer — https://stackoverflow.com/a/28889838/71522 — goes into much more detail on specifically what happens during each of == and in and is well worth the read.

提交回复
热议问题