Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?

前端 未结 11 1595
梦毁少年i
梦毁少年i 2020-11-22 03:46

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

11条回答
  •  忘掉有多难
    2020-11-22 04:26

    Use the source, Luke!

    In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we're using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

    1. Check that the number is between start and stop, and
    2. Check that the stride value doesn't "step over" our number.

    For example, 994 is in range(4, 1000, 2) because:

    1. 4 <= 994 < 1000, and
    2. (994 - 4) % 2 == 0.

    The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

    static int
    range_contains_long(rangeobject *r, PyObject *ob)
    {
        int cmp1, cmp2, cmp3;
        PyObject *tmp1 = NULL;
        PyObject *tmp2 = NULL;
        PyObject *zero = NULL;
        int result = -1;
    
        zero = PyLong_FromLong(0);
        if (zero == NULL) /* MemoryError in int(0) */
            goto end;
    
        /* Check if the value can possibly be in the range. */
    
        cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
        if (cmp1 == -1)
            goto end;
        if (cmp1 == 1) { /* positive steps: start <= ob < stop */
            cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
            cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
        }
        else { /* negative steps: stop < ob <= start */
            cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
            cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
        }
    
        if (cmp2 == -1 || cmp3 == -1) /* TypeError */
            goto end;
        if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
            result = 0;
            goto end;
        }
    
        /* Check that the stride does not invalidate ob's membership. */
        tmp1 = PyNumber_Subtract(ob, r->start);
        if (tmp1 == NULL)
            goto end;
        tmp2 = PyNumber_Remainder(tmp1, r->step);
        if (tmp2 == NULL)
            goto end;
        /* result = ((int(ob) - start) % step) == 0 */
        result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
      end:
        Py_XDECREF(tmp1);
        Py_XDECREF(tmp2);
        Py_XDECREF(zero);
        return result;
    }
    
    static int
    range_contains(rangeobject *r, PyObject *ob)
    {
        if (PyLong_CheckExact(ob) || PyBool_Check(ob))
            return range_contains_long(r, ob);
    
        return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                           PY_ITERSEARCH_CONTAINS);
    }
    

    The "meat" of the idea is mentioned in the line:

    /* result = ((int(ob) - start) % step) == 0 */ 
    

    As a final note - look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don't use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I'm using v3.5.0 here):

    >>> x, r = 1000000000000000, range(1000000000000001)
    >>> class MyInt(int):
    ...     pass
    ... 
    >>> x_ = MyInt(x)
    >>> x in r  # calculates immediately :) 
    True
    >>> x_ in r  # iterates for ages.. :( 
    ^\Quit (core dumped)
    

提交回复
热议问题