Python's itertools product memory consumption

泄露秘密 提交于 2019-12-09 14:38:37

问题


The documentation says that the cartesian product function

the actual implementation does not build up intermediate results in memory.

How can that be possible with generators? Can somebody show me an example with a bounded memory consumption for 2 generators?


回答1:


Looking at the module's source code, itertools.product() actually converts every argument to a tuple:

// product_new() in itertoolsmodule.c
for (i=0; i < nargs ; ++i) {
    PyObject *item = PyTuple_GET_ITEM(args, i);
    PyObject *pool = PySequence_Tuple(item); //<==== Call tuple(arg)
    if (pool == NULL)
        goto error;
    PyTuple_SET_ITEM(pools, i, pool);
    indices[i] = 0;
}

In other words, itertools.product()'s memory consumption appears to be linear in the size of the input arguments.




回答2:


Well, it also says:

The nested loops cycle like an odometer with the rightmost element advancing on every iteration. This pattern creates a lexicographic ordering so that if the input’s iterables are sorted, the product tuples are emitted in sorted order.

This is pretty much how it works in the implementation (Modules/itertoolsmodule.c)

Here is the state object:

typedef struct {
    PyObject_HEAD
    PyObject *pools;       /* tuple of pool tuples */
    Py_ssize_t *indices;   /* one index per pool */
    PyObject *result;      /* most recently returned result tuple */
    int stopped;           /* set to 1 when the product iterator is exhausted */
} productobject;

And the next item is returned by the function product_next, which uses this state and the algorithm described in the quote to generate the next state. See this answer to understand the memory requirements.

For general education, you can read about how to create generators with state from C extensions here.



来源:https://stackoverflow.com/questions/10720247/pythons-itertools-product-memory-consumption

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!